feat(llmo): async URL CSV export for agentic traffic by akshaymagapu · Pull Request #2401 · adobe/spacecat-api-service

akshaymagapu · 2026-05-12T14:05:53Z

What

Two endpoints for the agentic-traffic URL Performance dashboard's async CSV export:

POST /sites/:siteId/agentic-traffic/urls/export
GET  /sites/:siteId/agentic-traffic/urls/export/:exportId

How

POST canonicalises the filter set, hashes it into a deterministic exportId, and checks S3 first. Same filters → same key → cache hit. On a miss, an SQS message is enqueued and the reporting-worker (spacecat-reporting-worker#616) runs the export via the data-service RPC (mysticat-data-service#589). The user polls GET until metadata.json flips to success (presigned download URLs) or failed (reason).

POST  ── ListObjectsV2 + GetObject(metadata.json)
        ├─ status=success   →  200 ready    + presigned URLs
        ├─ status=failed    →  200 failed   + failureReason
        ├─ status=processing → 202 processing
        └─ no metadata      →  sqs.sendMessage → 202 processing

GET   ── same S3 cache check, no SQS path; pins exportId to /^[a-f0-9]{64}$/

Design notes

exportId = sha256(stableStringify(canonical filter set)) — order-stable JSON serialisation guarantees identical filters always produce the same key.
Aurora may split large exports into urls.csv + urls.csv_part2 + …; listExportCsvObjects returns them in stable part order.
Presigned URLs expire after 7 days.
Status polling reads only from S3 — no DB round-trip, no writer-pool pressure.

Config

Env var	Fallback
`AGENTIC_TRAFFIC_EXPORT_BUCKET`	`S3_REPORT_BUCKET` → `ctx.s3.s3Bucket`
`AGENTIC_TRAFFIC_EXPORT_QUEUE_URL`	`REPORT_JOBS_QUEUE_URL`
`AGENTIC_TRAFFIC_EXPORT_REGION`	`ctx.runtime.region` → `us-east-1`

Today the dedicated env vars are unset; fallbacks resolve to the existing report bucket / queue.

OpenAPI

New AgenticTrafficUrlsExportRequest / AgenticTrafficUrlsExportResponse schemas; new paths in llmo-api.yaml; agentic-traffic-by-url-api.md updated.

Tests

103 passing in llmo-agentic-traffic.test.js. Covers cache hit, cache miss + SQS enqueue, processing short-circuit, platform-code mapping into the hash, missing-config rejection (POST + GET), split-part presigning, exportId-shape validation, S3-error fallthrough.

Adds POST/GET endpoints for asynchronous URL-level CSV exports of the agentic traffic dashboard: POST /sites/:siteId/agentic-traffic/urls/export GET /sites/:siteId/agentic-traffic/urls/export/:exportId Flow: UI clicks export -> API endpoint receives filters -> API computes deterministic exportId (sha256 of canonical filter set) -> API checks S3 cache (S3_REPORT_BUCKET, agentic-traffic/url-exports/...) -> if CSV present + metadata=success: return presigned URL(s) (200 ready) -> if metadata=failed: return 200 failed + reason -> if metadata=processing: return 202 processing -> otherwise enqueue SQS job (REPORT_JOBS_QUEUE_URL) -> 202 processing Worker handles the SQS job, calls the data-service RPC (wrpc_agentic_traffic_urls_export_to_s3), and writes metadata.json. UI polls the status endpoint until ready/failed. Implementation notes: - Filter set is canonicalized (version + siteId + startDate/endDate + platform/categoryName/agentType/userAgent/contentType/successRate/ urlPathSearch + format) and stable-stringified before hashing, so the same filters always produce the same exportId regardless of JSON key order. Same filters -> same S3 key -> cache hit on retry. - Listing handles Aurora's split-file convention: when query_export_to_s3 splits a large export, additional objects appear as urls.csv_part2 / urls.csv_part3 / ... alongside urls.csv. The list step returns them in stable part order so the presigned-URL array matches. - Presigned URLs expire after 7 days (the SQS-driven export is async and the user may walk away from the polling tab). - Export bucket/queue/region resolve from env in priority order: AGENTIC_TRAFFIC_EXPORT_BUCKET > S3_REPORT_BUCKET > ctx.s3.s3Bucket, and AGENTIC_TRAFFIC_EXPORT_QUEUE_URL > REPORT_JOBS_QUEUE_URL. Missing config returns 400 with a descriptive message rather than 500. - parseAgenticTrafficParams now captures urlPathSearch (already supported by the data-service by-url RPC). Other handlers ignore the extra field; only the export hashes it into the exportId. OpenAPI: - New AgenticTrafficUrlsExportRequest / AgenticTrafficUrlsExportResponse schemas. - New llmo-api paths for export + status with 200/202 distinction. Tests: - 10 controller tests covering cache hit, queueing, processing short-circuit, platform-code mapping into the hash, missing config rejection, status processing/ready/failed states, split-part presigning, and exportId-shape validation. - Routes index test updated to include the new endpoints in both the controller mock and the route listing. Requires: - spacecat-infrastructure: aurora s3Export role association (PR #518) - mysticat-data-service: wrpc_agentic_traffic_urls_export_to_s3 RPC - spacecat-reporting-worker: agentic-traffic-urls-export SQS handler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The reporting-worker (spacecat-reporting-worker#616) resolves its allowed bucket from S3_REPORTING_BUCKET_NAME (the existing reports bucket env in that Lambda's environment). The API service was only checking S3_REPORT_BUCKET — fine in envs where both env vars resolve to the same bucket, but a mismatch in any env where only one is set would make the worker reject the SQS message with 's3Bucket must match the configured export bucket'. Add S3_REPORTING_BUCKET_NAME as an additional fallback in the API's getExportConfig so both names work and resolution stays consistent with the worker regardless of which env var the deploy config sets. Order: AGENTIC_TRAFFIC_EXPORT_BUCKET (preferred, dedicated) → S3_REPORTING_BUCKET_NAME (worker's name) → S3_REPORT_BUCKET (older name, some envs still have it) → ctx.s3.s3Bucket (SDK default). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

API service convention is S3_REPORT_BUCKET. The worker has its own S3_REPORTING_BUCKET_NAME convention in its Lambda env; both names resolve to the same spacecat-{env}-reports bucket at deploy time so cross-service consistency isn't an issue. Reverting the extra fallback to keep each repo using its native env-var name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-12T14:06:31Z

This PR will trigger a minor release when merged.

CI flagged two uncovered branches in createAgenticTrafficUrlsExportStatusHandler that the existing tests missed: - lines 748-749: `if (!hasText(s3Bucket)) return badRequest(...)` — missing-config branch on the GET endpoint (the POST endpoint version was already tested, but the GET version wasn't). - lines 776-778: the `catch (error)` block — unexpected S3 PUT/GET failure during status check. Two added tests close both. Project-wide coverage threshold is 100% and the prior failure was at 99.96% / 99.89% / 99.96% — these tests push it back to 100%. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

POST is the user's "I want this export" signal. Returning the prior failure verbatim permanently locked the cache key for the same filter set — users had to either manually delete metadata.json from S3 or tweak filters to change the exportId before they could retry. Drop the isExportFailed early-return from the POST handler so failed metadata falls through to the enqueue path, identical to "no metadata". The worker overwrites the failed metadata.json on the retry; the cache contract still holds (same filters → same exportId → same S3 key) so retries are free of side effects. GET keeps the explicit 'failed' branch — status polling is a pure read; reporting the failure is still its job. Test added asserting the new POST-with-failed-metadata behavior. 103 → 104 passing. OpenAPI POST description updated to document the new status semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…be/spacecat-api-service into feat/agentic-traffic-urls-export-api

CI flagged two more uncovered branches in llmo-agentic-traffic.js: - lines 712-714: catch (error) in the POST handler — unexpected S3 or SQS error inside the try block. Added a test that rejects the S3 send stub and asserts 500 + the log line. - lines 740-741: missing s3Client / ListObjectsV2Command / GetObjectCommand / getSignedUrl guard on the GET status handler. Added a test that strips ctx.s3 and asserts 400. Tests 104 → 106 passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

CI was still at 99.6% lines / 96.97% branches with two uncovered ranges: - Lines 160-161: stableStringify array branch. The canonical export payload is a flat object of primitives — the array branch was unreachable. Removed; the recursive object branch is sufficient. - Lines 663-664: POST !hasText(s3Bucket) || !hasText(queueUrl) — the second config-check that trips after the s3?.s3Client guard. The existing 'not configured' test strips s3/sqs entirely and trips the earlier check, so this branch was never exercised. Added a test where S3/SQS SDKs are present but env vars / s3Bucket are stripped. 107 tests passing; targeted coverage on llmo-agentic-traffic.js shows 100% statements/lines/functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously verified 100% statements/lines/functions but missed that branches were still at 97.58%. CI's 100% threshold caught it. Going through every reachable `||` / `??` fallback now. Tests added (107 → 113): - defaults s3Region to 'us-east-1' when AGENTIC_TRAFFIC_EXPORT_REGION and ctx.runtime.region are both missing, plus requestedBy fallback to 'unknown' when ctx.attributes.authInfo.profile.email is absent. - ListObjectsV2 returning a response without the Contents field — exercises the `result.Contents || []` defense. - success metadata that lacks rowCount/filesUploaded/bytesUploaded — exercises the `?? null` and `?? csvKeys.length` paths. - failed metadata without failureReason — exercises the 'Export failed' default reason. - GET status with undefined ctx.params.exportId — distinct from the 'not-a-hash' truthy-but-invalid case; covers `exportId || ''`. - GetObject error shaped as `error.$metadata.httpStatusCode = 404` rather than `error.name = 'NoSuchKey'` — both are treated as a missing-metadata signal. Dead-branch annotation: - The `Number(key.match(...)?.[1] || 1)` fallback in listExportCsvObjects' sort comparator is unreachable — listExportCsvObjects' own filter already guarantees keys end in `_partN` for the non-csvKey branch. Added `/* c8 ignore next */` with an explanatory comment so the dead branch is documented rather than silently failing coverage. Local verify before pushing this time: 100% statements / branches / functions / lines on llmo-agentic-traffic.js. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-05-12T16:05:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

akshaymagapu and others added 3 commits May 12, 2026 16:02

akshaymagapu changed the title ~~feat(llmo): async S3-backed CSV export for agentic URL traffic~~ feat(llmo): async URL CSV export for agentic traffic May 12, 2026

akshaymagapu and others added 4 commits May 12, 2026 16:36

Merge branch 'main' into feat/agentic-traffic-urls-export-api

8b6a0f3

Merge branch 'feat/agentic-traffic-urls-export-api' of github.com:ado…

3687818

…be/spacecat-api-service into feat/agentic-traffic-urls-export-api

akshaymagapu added the enhancement New feature or request label May 12, 2026

akshaymagapu and others added 2 commits May 12, 2026 17:40

akshaymagapu temporarily deployed to dev-branches May 12, 2026 16:10 — with GitHub Actions Inactive

akshaymagapu requested a review from calvarezg May 12, 2026 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmo): async URL CSV export for agentic traffic#2401

feat(llmo): async URL CSV export for agentic traffic#2401
akshaymagapu wants to merge 10 commits into
mainfrom
feat/agentic-traffic-urls-export-api

akshaymagapu commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akshaymagapu commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Design notes

Config

OpenAPI

Tests

Related

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

akshaymagapu commented May 12, 2026 •

edited

Loading