feat(llmo): async URL CSV export for agentic traffic#2401
Open
akshaymagapu wants to merge 10 commits into
Open
feat(llmo): async URL CSV export for agentic traffic#2401akshaymagapu wants to merge 10 commits into
akshaymagapu wants to merge 10 commits into
Conversation
Adds POST/GET endpoints for asynchronous URL-level CSV exports of the
agentic traffic dashboard:
POST /sites/:siteId/agentic-traffic/urls/export
GET /sites/:siteId/agentic-traffic/urls/export/:exportId
Flow:
UI clicks export
-> API endpoint receives filters
-> API computes deterministic exportId (sha256 of canonical filter set)
-> API checks S3 cache (S3_REPORT_BUCKET, agentic-traffic/url-exports/...)
-> if CSV present + metadata=success: return presigned URL(s) (200 ready)
-> if metadata=failed: return 200 failed + reason
-> if metadata=processing: return 202 processing
-> otherwise enqueue SQS job (REPORT_JOBS_QUEUE_URL) -> 202 processing
Worker handles the SQS job, calls the data-service RPC
(wrpc_agentic_traffic_urls_export_to_s3), and writes metadata.json.
UI polls the status endpoint until ready/failed.
Implementation notes:
- Filter set is canonicalized (version + siteId + startDate/endDate +
platform/categoryName/agentType/userAgent/contentType/successRate/
urlPathSearch + format) and stable-stringified before hashing, so the
same filters always produce the same exportId regardless of JSON key
order. Same filters -> same S3 key -> cache hit on retry.
- Listing handles Aurora's split-file convention: when query_export_to_s3
splits a large export, additional objects appear as urls.csv_part2 /
urls.csv_part3 / ... alongside urls.csv. The list step returns them in
stable part order so the presigned-URL array matches.
- Presigned URLs expire after 7 days (the SQS-driven export is async and
the user may walk away from the polling tab).
- Export bucket/queue/region resolve from env in priority order:
AGENTIC_TRAFFIC_EXPORT_BUCKET > S3_REPORT_BUCKET > ctx.s3.s3Bucket, and
AGENTIC_TRAFFIC_EXPORT_QUEUE_URL > REPORT_JOBS_QUEUE_URL. Missing
config returns 400 with a descriptive message rather than 500.
- parseAgenticTrafficParams now captures urlPathSearch (already supported
by the data-service by-url RPC). Other handlers ignore the extra
field; only the export hashes it into the exportId.
OpenAPI:
- New AgenticTrafficUrlsExportRequest / AgenticTrafficUrlsExportResponse
schemas.
- New llmo-api paths for export + status with 200/202 distinction.
Tests:
- 10 controller tests covering cache hit, queueing, processing
short-circuit, platform-code mapping into the hash, missing config
rejection, status processing/ready/failed states, split-part presigning,
and exportId-shape validation.
- Routes index test updated to include the new endpoints in both the
controller mock and the route listing.
Requires:
- spacecat-infrastructure: aurora s3Export role association (PR #518)
- mysticat-data-service: wrpc_agentic_traffic_urls_export_to_s3 RPC
- spacecat-reporting-worker: agentic-traffic-urls-export SQS handler
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reporting-worker (spacecat-reporting-worker#616) resolves its
allowed bucket from S3_REPORTING_BUCKET_NAME (the existing reports
bucket env in that Lambda's environment). The API service was only
checking S3_REPORT_BUCKET — fine in envs where both env vars resolve
to the same bucket, but a mismatch in any env where only one is set
would make the worker reject the SQS message with 's3Bucket must
match the configured export bucket'.
Add S3_REPORTING_BUCKET_NAME as an additional fallback in the API's
getExportConfig so both names work and resolution stays consistent
with the worker regardless of which env var the deploy config sets.
Order: AGENTIC_TRAFFIC_EXPORT_BUCKET (preferred, dedicated)
→ S3_REPORTING_BUCKET_NAME (worker's name)
→ S3_REPORT_BUCKET (older name, some envs still have it)
→ ctx.s3.s3Bucket (SDK default).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
API service convention is S3_REPORT_BUCKET. The worker has its own
S3_REPORTING_BUCKET_NAME convention in its Lambda env; both names
resolve to the same spacecat-{env}-reports bucket at deploy time so
cross-service consistency isn't an issue. Reverting the extra fallback
to keep each repo using its native env-var name.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
CI flagged two uncovered branches in
createAgenticTrafficUrlsExportStatusHandler that the existing tests
missed:
- lines 748-749: `if (!hasText(s3Bucket)) return badRequest(...)` —
missing-config branch on the GET endpoint (the POST endpoint
version was already tested, but the GET version wasn't).
- lines 776-778: the `catch (error)` block — unexpected S3 PUT/GET
failure during status check.
Two added tests close both. Project-wide coverage threshold is 100%
and the prior failure was at 99.96% / 99.89% / 99.96% — these tests
push it back to 100%.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST is the user's "I want this export" signal. Returning the prior failure verbatim permanently locked the cache key for the same filter set — users had to either manually delete metadata.json from S3 or tweak filters to change the exportId before they could retry. Drop the isExportFailed early-return from the POST handler so failed metadata falls through to the enqueue path, identical to "no metadata". The worker overwrites the failed metadata.json on the retry; the cache contract still holds (same filters → same exportId → same S3 key) so retries are free of side effects. GET keeps the explicit 'failed' branch — status polling is a pure read; reporting the failure is still its job. Test added asserting the new POST-with-failed-metadata behavior. 103 → 104 passing. OpenAPI POST description updated to document the new status semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…be/spacecat-api-service into feat/agentic-traffic-urls-export-api
CI flagged two more uncovered branches in llmo-agentic-traffic.js:
- lines 712-714: catch (error) in the POST handler — unexpected S3
or SQS error inside the try block. Added a test that rejects the
S3 send stub and asserts 500 + the log line.
- lines 740-741: missing s3Client / ListObjectsV2Command /
GetObjectCommand / getSignedUrl guard on the GET status handler.
Added a test that strips ctx.s3 and asserts 400.
Tests 104 → 106 passing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI was still at 99.6% lines / 96.97% branches with two uncovered ranges: - Lines 160-161: stableStringify array branch. The canonical export payload is a flat object of primitives — the array branch was unreachable. Removed; the recursive object branch is sufficient. - Lines 663-664: POST !hasText(s3Bucket) || !hasText(queueUrl) — the second config-check that trips after the s3?.s3Client guard. The existing 'not configured' test strips s3/sqs entirely and trips the earlier check, so this branch was never exercised. Added a test where S3/SQS SDKs are present but env vars / s3Bucket are stripped. 107 tests passing; targeted coverage on llmo-agentic-traffic.js shows 100% statements/lines/functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously verified 100% statements/lines/functions but missed that branches were still at 97.58%. CI's 100% threshold caught it. Going through every reachable `||` / `??` fallback now. Tests added (107 → 113): - defaults s3Region to 'us-east-1' when AGENTIC_TRAFFIC_EXPORT_REGION and ctx.runtime.region are both missing, plus requestedBy fallback to 'unknown' when ctx.attributes.authInfo.profile.email is absent. - ListObjectsV2 returning a response without the Contents field — exercises the `result.Contents || []` defense. - success metadata that lacks rowCount/filesUploaded/bytesUploaded — exercises the `?? null` and `?? csvKeys.length` paths. - failed metadata without failureReason — exercises the 'Export failed' default reason. - GET status with undefined ctx.params.exportId — distinct from the 'not-a-hash' truthy-but-invalid case; covers `exportId || ''`. - GetObject error shaped as `error.$metadata.httpStatusCode = 404` rather than `error.name = 'NoSuchKey'` — both are treated as a missing-metadata signal. Dead-branch annotation: - The `Number(key.match(...)?.[1] || 1)` fallback in listExportCsvObjects' sort comparator is unreachable — listExportCsvObjects' own filter already guarantees keys end in `_partN` for the non-csvKey branch. Added `/* c8 ignore next */` with an explanatory comment so the dead branch is documented rather than silently failing coverage. Local verify before pushing this time: 100% statements / branches / functions / lines on llmo-agentic-traffic.js. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two endpoints for the agentic-traffic URL Performance dashboard's async CSV export:
How
POST canonicalises the filter set, hashes it into a deterministic
exportId, and checks S3 first. Same filters → same key → cache hit. On a miss, an SQS message is enqueued and the reporting-worker (spacecat-reporting-worker#616) runs the export via the data-service RPC (mysticat-data-service#589). The user polls GET untilmetadata.jsonflips tosuccess(presigned download URLs) orfailed(reason).Design notes
exportId = sha256(stableStringify(canonical filter set))— order-stable JSON serialisation guarantees identical filters always produce the same key.urls.csv+urls.csv_part2+ …;listExportCsvObjectsreturns them in stable part order.Config
AGENTIC_TRAFFIC_EXPORT_BUCKETS3_REPORT_BUCKET→ctx.s3.s3BucketAGENTIC_TRAFFIC_EXPORT_QUEUE_URLREPORT_JOBS_QUEUE_URLAGENTIC_TRAFFIC_EXPORT_REGIONctx.runtime.region→us-east-1Today the dedicated env vars are unset; fallbacks resolve to the existing report bucket / queue.
OpenAPI
New
AgenticTrafficUrlsExportRequest/AgenticTrafficUrlsExportResponseschemas; new paths inllmo-api.yaml;agentic-traffic-by-url-api.mdupdated.Tests
103 passing in
llmo-agentic-traffic.test.js. Covers cache hit, cache miss + SQS enqueue, processing short-circuit, platform-code mapping into the hash, missing-config rejection (POST + GET), split-part presigning, exportId-shape validation, S3-error fallthrough.Related