docs: ADR-002 preflight API REST redesign by GeezerPelletier · Pull Request #2377 · adobe/spacecat-api-service

GeezerPelletier · 2026-05-08T18:26:26Z

Summary

Adds docs/decisions/002-preflight-api-rest-redesign.md capturing the decision to replace /preflight/beta/jobs with site-scoped REST endpoints under /sites/:siteId/preflights, and to deprecate both legacy endpoint pairs
No code changes - this is a decision record for team review before implementation begins

Related: SITES-44290 | Bulk endpoint deferred to: SITES-44432

What's being proposed

POST /sites/:siteId/preflights

Request body:

{
  "url": "https://main--site--org.hlx.page/some-path"
}

promiseToken passed via cookie for authenticated CMS pages. createdBy is derived server-side from the caller's IMS profile and never supplied by the client. step is removed - Mysticat always runs the full identify/suggest flow. The submitted url must match one of the site's known hostnames (base URL, preview URL, or live URL) - a mismatch returns PREFLIGHT_INVALID_REQUEST.

Response 202 Accepted:

Headers:

Location: https://spacecat.experiencecloud.live/api/v1/sites/{siteId}/preflights/{preflightId}

Body:

{
  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "IN_PROGRESS",
  "createdAt": "2026-05-11T10:00:00.000Z",
  "createdBy": {
    "id": "ABC123@techacct.adobe.com",
    "displayName": "John Doe"
  }
}

Error responses:

Status	`errorCode`	Condition
`400 Bad Request`	`PREFLIGHT_INVALID_REQUEST`	`url` is missing, not a valid URI, or does not belong to the site identified by `:siteId`
`403 Forbidden`	`PREFLIGHT_ACCESS_DENIED`	Caller does not have access to the site
`403 Forbidden`	`PREFLIGHT_NOT_ENABLED`	Preflight is not enabled for the site
`404 Not Found`	`PREFLIGHT_SITE_NOT_FOUND`	`siteId` does not exist
`502 Bad Gateway`	`PREFLIGHT_UPSTREAM_ERROR`	Mysticat returned a 5xx response
`500 Internal Server Error`	`PREFLIGHT_INTERNAL_ERROR`	Unexpected error within this service

Error response body:

{
  "errorCode": "PREFLIGHT_NOT_ENABLED",
  "message": "Preflight is not enabled for this site"
}

No job record is created for 400, 403, or 404 responses.

GET /sites/:siteId/preflights

Response 200 OK - grouped by URL, capped at 50 most recent preflights per site (sorted by createdAt desc across all URLs before grouping):

[
  {
    "url": "https://main--site--org.hlx.page/some-path",
    "preflights": [
      {
        "preflightId": "3fa85f64-...",
        "status": "COMPLETED",
        "createdAt": "2026-05-11T10:00:00.000Z",
        "updatedAt": "2026-05-11T10:00:05.000Z",
        "createdBy": { "id": "ABC123@techacct.adobe.com", "displayName": "John Doe" }
      }
    ]
  }
]

GET /sites/:siteId/preflights/:preflightId

Response 200 OK - full detail. The handler verifies the loaded job's siteId matches the path's :siteId, returning 404 on mismatch:

{
  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "COMPLETED",
  "url": "https://main--site--org.hlx.page/some-path",
  "createdAt": "2026-05-11T10:00:00.000Z",
  "createdBy": { "id": "ABC123@techacct.adobe.com", "displayName": "John Doe" },
  "updatedAt": "2026-05-11T10:00:05.000Z",
  "startedAt": "2026-05-11T10:00:01.000Z",
  "endedAt": "2026-05-11T10:00:05.000Z",
  "result": {},
  "resultType": "INLINE",
  "resultLocation": null,
  "error": null
}

When resultType is S3 or URL, result will be null and the client should fetch from resultLocation.

Data model

AsyncJob will be extended in spacecat-shared-data-access with:

siteId as an optional top-level indexed attribute
jobType as an optional top-level indexed attribute
allBySiteIdAndJobType(siteId, jobType) collection method

Both attributes are optional so that existing job creation paths (including the deprecated /preflight/jobs queue-based flow) continue to work unchanged. url grouping on the list endpoint is an in-memory groupBy on metadata.url after the indexed query - acceptable given the 50-record cap.

createdBy is stored as { id, displayName } in job metadata at creation time - id is profile.email (the IMS user ID) and displayName is composed from profile.first_name + last_name. Both fields are available on the authenticated profile; no additional IMS lookup required. Surfaces in all three endpoint responses for audit purposes.

This is a prerequisite: the spacecat-shared-data-access change must land before controller work begins.

Current endpoint fate

Both legacy endpoint pairs are deprecated and remain functional in parallel with the new endpoints. The Sunset date will be set by PM at the time this ADR moves to Accepted, with a minimum of 90 days from MFE migration start.

Endpoint	Action
POST /preflight/beta/jobs	Deprecated - internal use only; no external consumers
GET /preflight/beta/jobs/:jobId	Deprecated - internal use only; no external consumers
POST /preflight/jobs	Deprecated - queue-based; migration timeline to be coordinated with consumers
GET /preflight/jobs/:jobId	Deprecated - queue-based; migration timeline to be coordinated with consumers

Test plan

Read the ADR and validate the design rationale
Confirm the proposed endpoint shape matches team expectations before implementation starts

?? Generated with Claude Code

Captures the decision to replace /preflight/jobs and /preflight/beta/jobs with site-scoped endpoints under /sites/:siteId/preflights. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-08T18:29:50Z

This PR will trigger no release when merged.

codecov · 2026-05-08T18:37:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AsyncJob has no siteId index; GET /sites/:siteId/preflights requires either extending AsyncJob or a new Preflight entity in shared-data-access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eId index AsyncJob TTL (~7 days) makes a purpose-built Preflight entity unnecessary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Existing job creation paths must not be affected; new endpoints populate siteId at creation time and query by it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Capture IMS caller identity server-side at job creation for audit purposes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sponse Per RFC 7231 §6.3.3 the created resource URL belongs in the Location response header, not a custom body field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add error response table to POST endpoint (400/403/404/500) - Disabled-site case returns 403 immediately; no job record created - Previous CANCELLED-job behavior on disabled preflight is explicitly removed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Both /preflight/beta/jobs and /preflight/jobs are deprecated and run in parallel with the new endpoints until consumers migrate and a deletion milestone is agreed upon. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bhellema · 2026-05-11T14:55:11Z

+Body:
+```json
+{
+  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",


should this not just be id ? Is this not the id of the aysnc job?

The rename is intentional — jobId leaks the AsyncJob backing model to consumers, which is an implementation detail they shouldn't need to know about. preflightId is the domain name for this resource. Internally it maps 1:1 to the AsyncJob ID.

Sure, preflightId is fine, I figured that id would be enough. The response is a preflight response and here's the id.

bhellema · 2026-05-11T14:56:31Z

+{
+  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
+  "status": "IN_PROGRESS",
+  "createdAt": "2026-05-11T10:00:00.000Z",


Should we include last updatedAt as well.

Good call — updatedAt is already on the GET by ID response. Worth adding to the list items too so clients can detect state changes without fetching each one individually.

bhellema · 2026-05-11T14:56:57Z

+**Request body** (`application/json`):
+```json
+{
+  "url": "https://main--site--org.hlx.page/some-path",


Today you can pass a list of urls.

Yes, and that's an intentional change — one URL per request, one preflightId per URL. Each preflight is an independently pollable resource. If the MFE needs bulk in future, clients fire parallel requests. This avoids partial-failure complexity and keeps each job clean. See the key changes section for the full rationale.

Client's could select hundreds of pages. This is too much load to manage. Use case, customer is ready to run a promotion and they have many different areas of the site they want to run preflight on. Allowing them to send 1 job with many urls makes sense. It could take a while, but running in bulk this way would be expected to be slower.

Same response as above — see the thread on line 44 for the full proposal.

bhellema · 2026-05-11T14:57:25Z

+```json
+{
+  "url": "https://main--site--org.hlx.page/some-path",
+  "step": "identify",


The concept of step should be going away as we always will perform identify/suggest as part of the mysticat agent step.

Good point and worth nailing down now. If the Mysticat agent always runs both identify and suggest as a single flow, step is redundant and we should drop it from the new endpoint entirely rather than carry it forward and deprecate it later. Can you confirm that's the direction? If so we'll remove it from the design.

This is the direction that we want to go.

bhellema · 2026-05-11T14:58:01Z

+{
+  "url": "https://main--site--org.hlx.page/some-path",
+  "step": "identify",
+  "mystiqueUrl": "optional-ephemeral-host.stage.cloud.adobe.io"


I don't see the value of sending this back to the client.

To clarify — mystiqueUrl is a request-only field, never returned in any response. It lets developers route a preflight to an ephemeral Mysticat instance for testing without hitting prod, and is blocked in non-dev environments. If you're questioning whether it belongs in the API at all that's a fair debate — it could move to an env-level config instead.

Let's remove it from the API response.

bhellema · 2026-05-11T15:02:33Z

+|-------|------|----------|-------------|
+| `url` | string (URI) | Yes | The single page URL to analyze |
+| `step` | enum: `identify` \| `suggest` | Yes | Audit step to perform |
+| `mystiqueUrl` | string | No | Dev-only override for the Mysticat service URL |


url i don't think needs to be exposed here.

The url field is required in the request body because Mysticat needs it to scrape the specific page via DRS headless browser — siteId alone isn't enough since a site has many pages and preflight is per-page. As for the response, it is useful in GET by ID so the consumer knows what was analyzed. Could reasonably be omitted from the 202 POST response body given the Location header already points to the resource.

I do not think this should be exposed as part of the public api.

Apologies for the misread — we interpreted this as being about the url field rather than mystiqueUrl. Agreed, mystiqueUrl is removed from the public documentation. It stays in the implementation as an undocumented dev override for testing against ephemeral Mysticat instances, just not part of the public API contract.

bhellema · 2026-05-11T15:02:53Z

+| `step` | enum: `identify` \| `suggest` | Yes | Audit step to perform |
+| `mystiqueUrl` | string | No | Dev-only override for the Mysticat service URL |
+
+`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.


There is a fix that should be set via the header not cookie.

Agreed a header would be cleaner. The cookie approach is inherited from the /auth/promise browser flow where it's set automatically. Switching to a header would require the MFE to read and forward it explicitly — worth a separate cleanup ticket but out of scope for this redesign.

bhellema · 2026-05-11T15:03:16Z

+`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.
+
+`createdBy` is derived server-side from the caller's IMS profile (`authInfo.getProfile().email`) and is never supplied by the client.
+


createdAt - would be good to report too

createdAt is already in the 202 response body — you may have been looking at an earlier version of the doc before we expanded the response shapes.

bhellema · 2026-05-11T15:05:14Z

+| `500 Internal Server Error` | Mysticat call failed or unexpected error |
+
+No job record is created for `400`, `403`, or `404` responses. The current `/preflight/beta/jobs`
+behavior of creating a job and immediately setting it to `CANCELLED` when preflight is disabled


Cancelled is not the correct status here it feels like cancelled is a user action. Maybe something like ABORTED.

Good point on the semantics, but we've resolved this differently — the new design returns 403 Forbidden immediately when preflight is disabled, without creating a job record at all. No phantom CANCELLED or ABORTED record is created. See the error responses table in the POST section.

Nice! I like it. However as part of the 403, what does the response body look like to allow the UI to provide a meaningful message to the user as if they do not have access to the site, or that the audit was disabled.

Good call. The error body includes an errorCode for machine-readable handling and a message as a human-readable hint:

{ "errorCode": "PREFLIGHT_NOT_ENABLED", "message": "Preflight is not enabled for this site" }

All codes are PREFLIGHT_-prefixed for consistency: PREFLIGHT_ACCESS_DENIED, PREFLIGHT_NOT_ENABLED, PREFLIGHT_INVALID_REQUEST, PREFLIGHT_SITE_NOT_FOUND, PREFLIGHT_INTERNAL_ERROR. Consumers translate errorCode to their own messaging — message is informational only. ADR updated.

Open to other suggestions on the error code style if there's a preferred convention.

I'm happy that at least there's an error code! Using a numeric value just requires another lookup, I don't mind the string.

Agreed — string codes are self-documenting and don't require a lookup table. Happy we're aligned.

bhellema · 2026-05-11T15:36:27Z

+  {
+    "url": "https://main--site--org.hlx.page/some-path",
+    "preflights": [
+      {
+        "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
+        "status": "COMPLETED",
+        "step": "identify",
+        "createdAt": "2026-05-11T10:00:00.000Z",
+        "createdBy": "user@example.com"
+      },
+      {
+        "preflightId": "7c9b1e32-1234-4abc-b3fc-9f8a7c6d5e4f",
+        "status": "COMPLETED",
+        "step": "suggest",
+        "createdAt": "2026-05-11T10:05:00.000Z",
+        "createdBy": "user@example.com"
+      }
+    ]
+  },


This structure is not what I would expect, but rather a list of jobs with some minor details:

[ { "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "COMPLETED", "urlCount": 12, <----- maybe show the number of urls they ran with "opportunities": 5, <------ the number of opportunities that were created "createdAt": "2026-05-11T10:00:00.000Z", "createdBy": "user@example.com" } ]

Listing the preflight jobs should be lightweight and fast. Details of the jobs can be fetched with a follow up with the id of the preflight job.

The flat list is simpler and we can drop the URL grouping — agreed. The opportunities count is an interesting addition; does Mysticat write that back to the job result today? If so it's worth surfacing in the list.

One question on urlCount: that field implies a job can span multiple URLs, but the new design is one URL per preflight. With that model urlCount would always be 1. Was the intent to keep multi-URL jobs, or is urlCount a carry-over from the old design?

I think we must consider the ability for multiple urls. If we get to the point of providing a different experience where they can select many pages and initiate a preflight request, we need a way to make this experience simply for the client, being a headless agent or client ui.

Addressed in the thread on line 44 — the proposal is to keep this endpoint single-URL for now since it was never actually used with multiple URLs in practice, and add a dedicated POST /sites/:siteId/preflights/bulk endpoint when that use case is confirmed. Adding it later is cheap and keeps the design clean in the meantime.

bhellema

We really need to consider what the bulk use case looks like.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Confirmed by Ben: identify/suggest are no longer separate steps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Dev-only override, not part of the public contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier · 2026-05-12T13:30:32Z

@bhellema — created SITES-44432 to track the bulk endpoint use case (POST /sites/:siteId/preflights/bulk). It's linked to SITES-44290 as a follow-on. We can design it properly once the single-URL endpoint ships and the bulk use case is confirmed from the MFE side.

johobot · 2026-05-12T13:48:25Z

+
+`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.
+
+`createdBy` is derived server-side from the caller's IMS profile (`authInfo.getProfile().email`) and is never supplied by the client.


Worth noting: in the current codebase profile.email holds the IMS user ID string, not necessarily a human-readable email address (see comment in api-key.js). The pattern used by other controllers is authInfo.getProfile().email || 'system'. The rendered createdBy value in the MFE may look like an opaque ID rather than an address — worth aligning on so implementers and the MFE team have the same expectation.

Good catch on the IMS profile ambiguity. We've expanded createdBy from a plain string to an object:

"createdBy": { "id": "ABC123@techacct.adobe.com", "displayName": "John Doe" }

id is profile.email (the IMS user ID — the naming is misleading but consistent with the rest of the codebase, as noted in api-key.js). displayName is composed from profile.first_name + last_name, falling back to profile.name. No extra IMS lookup needed — both fields are on the authenticated profile already. This shape is easy to extend with additional identity fields in future without breaking the API contract. ADR updated.

johobot · 2026-05-12T13:48:25Z

+
+| Status | `errorCode` | Condition |
+|--------|-------------|-----------|
+| `400 Bad Request` | `PREFLIGHT_INVALID_REQUEST` | `url` is missing or invalid |


Two gaps worth closing here:

"Invalid" is not defined. Does the URL need to match a known Helix/AEM hostname pattern? The current implementation validates against site preview/live/CDN URLs via findByPreviewURL. Is that check being preserved, or is URL-to-site resolution truly gone?

If it's gone, there's nothing preventing a caller from submitting a url that belongs to a different site than :siteId. That mismatches context for link normalization and auth token resolution in the audit worker, which uses site.getBaseURL(). Recommend keeping hostname validation and returning a distinct error code if the URL doesn't match the site.

Both points addressed. The controller will validate that the url hostname matches one of the site's known hostnames (base URL, preview URL, or live URL) after loading the site via Site.findById(siteId). A URL that is structurally valid but belongs to a different site returns PREFLIGHT_INVALID_REQUEST — so the cross-site submission risk is closed. The error table and field description in the ADR are updated to make "invalid" explicit. This replaces the implicit validation that findByPreviewURL previously provided as a side effect.

johobot · 2026-05-12T13:48:25Z

+| `403 Forbidden` | `PREFLIGHT_ACCESS_DENIED` | Caller does not have access to the site |
+| `403 Forbidden` | `PREFLIGHT_NOT_ENABLED` | Preflight is not enabled for the site |
+| `404 Not Found` | `PREFLIGHT_SITE_NOT_FOUND` | `siteId` does not exist |
+| `500 Internal Server Error` | `PREFLIGHT_INTERNAL_ERROR` | Mysticat call failed or unexpected error |


When Mysticat itself returns 5xx, 502 Bad Gateway (or 503) is more accurate than 500 — it signals the fault is downstream, not in this service. That distinction matters for client retry logic and on-call triage.

Good catch. Splitting into 502 Bad Gateway / PREFLIGHT_UPSTREAM_ERROR for Mysticat failures and 500 Internal Server Error / PREFLIGHT_INTERNAL_ERROR for unexpected local errors. Updated the ADR.

johobot · 2026-05-12T13:48:25Z

+
+### GET /sites/:siteId/preflights
+
+**Response** `200 OK` — grouped by URL, with a nested array of preflights per page. A site can have preflights for multiple URLs:


Two concerns with this shape:

Every other list endpoint in the service returns a flat array — this URL-grouped structure is a one-off that requires in-memory groupBy in the controller and can't go through the standard DTO path.

No pagination is specified. An active site accumulates 300+ records per 7-day TTL window. The service uses cursor-based limit/cursor pagination elsewhere. Even a default cap needs to be defined here, since adding nextCursor to a bare top-level array later is a breaking change.

On (1): agreed it's a one-off shape, but the MFE benefit of seeing history grouped by URL outweighs the controller complexity for now - the groupBy is a straightforward in-memory step after a single allBySiteIdAndJobType query.

On (2): adding a default cap of 50 most recent preflights per site (sorted by createdAt desc across all URLs before grouping). Given the 7-day AsyncJob TTL the natural upper bound is low in practice. Full cursor-based pagination is deferred - updated the ADR.

johobot · 2026-05-12T13:48:25Z

+| `preflights[].createdBy` | string | IMS email of the user who triggered the preflight |
+
+---
+


The ADR doesn't specify that this handler validates preflightId belongs to the path's :siteId. The current getPreflightJobStatusAndResult loads by preflightId alone with no site ownership check. If that carries forward, a caller can read any preflight result — including scraped draft CMS content — by substituting a different siteId in the path. The handler should load by (siteId, preflightId) composite and return 404 if either mismatches, never by ID alone.

Good security catch. Adding an explicit ownership check to the ADR: the handler loads by preflightId then verifies the stored siteId matches the path's :siteId, returning 404 on mismatch - same response as a non-existent preflightId - so callers cannot confirm a preflight exists by probing with a different site path. Updated the ADR.

johobot · 2026-05-12T13:48:25Z

+  "createdBy": "user@example.com",
+  "updatedAt": "2026-05-11T10:00:05.000Z",
+  "startedAt": "2026-05-11T10:00:01.000Z",
+  "endedAt": "2026-05-11T10:00:05.000Z",


AsyncJob already has resultLocation (S3 URI) and resultType (INLINE/S3/URL) for handling large payloads. In practice, preflight results can be substantial and are often written to S3, leaving result: null in the DB record. Clients polling this endpoint need to know when to follow resultLocation instead. The response table should document both fields and the decision rule.

Good point - adding resultType and resultLocation to the GET by ID response table with the decision rule: when resultType is S3 or URL, result will be null and the client should fetch from resultLocation; when resultType is INLINE, result contains the payload directly and resultLocation is null. Updated the ADR.

johobot · 2026-05-12T13:48:25Z

+- **`jobId` renamed to `preflightId`** in all responses.
+- **`pollUrl` removed from the response body.** Per RFC 7231 §6.3.3, the URL of the created
+  resource is communicated via the `Location` response header. Clients that need to poll for
+  completion read `Location` rather than a body field.


The worker still branches on step in two places:

src/preflight/links.js:102 — if (step === 'suggest') gates the AI suggestion call

src/preflight/metatags.js:103 — if (step === 'suggest') gates metatagsAutoSuggest

Dropping step from the new endpoint without a coordinated worker change means callers silently get identify-only results (the worker defaults to PREFLIGHT_STEP_IDENTIFY when the field is absent). If Mysticat has already been updated to always run both phases unconditionally, this ADR should reference that change. If not, it's a prerequisite alongside the spacecat-shared-data-access change.

Confirmed - Mysticat has no concept of step; it always runs both identify and suggest in a single flow. The branching in links.js and metatags.js is dead code that will be removed as part of this implementation. No separate prerequisite needed. Updated the ADR.

johobot · 2026-05-12T13:48:25Z

+
+**Both existing endpoint pairs are deprecated**, not removed. They will remain functional in
+parallel with the new endpoints until the MFE has migrated and a deletion milestone is agreed
+upon. Deprecation notices should be added to their OpenAPI spec entries and response headers


Sunset: <date-tbd> doesn't satisfy RFC 8594, which requires a concrete date. In practice this tends to become permanent dual maintenance. Recommend committing to a process here: e.g. "date set by PM at the time this ADR moves to Accepted, minimum 90 days from MFE migration start."

Agreed - adopting your suggested process: the Sunset date will be set by PM at the time this ADR moves to Accepted, with a minimum of 90 days from MFE migration start. Updated the ADR.

johobot · 2026-05-12T13:48:25Z

+
+The decision is to **extend `AsyncJob` in `spacecat-shared-data-access`**:
+
+- Add `siteId` as an **optional** top-level indexed attribute on the `AsyncJob` schema


jobType is currently stored inside metadata.jobType (e.g. metadata: { jobType: 'preflight' }), not as a top-level attribute. allBySiteIdAndJobType would therefore need to either (a) promote jobType to a top-level indexed column — another schema change and DB migration — or (b) fetch all jobs by siteId index and filter in-memory on metadata.jobType. Option (b) is fine at low volume but the choice should be explicit. The same applies to url, which also lives in metadata and is used for grouping on the list endpoint.

Good clarification. The ADR already proposes promoting siteId to a top-level indexed column - extending that to also promote jobType as a top-level indexed attribute in the same spacecat-shared-data-access change is the right call, so allBySiteIdAndJobType queries both indexes without touching metadata. The url grouping for the list endpoint remains an in-memory groupBy on metadata.url after the indexed query - acceptable given the 50-record cap. Updated the ADR.

Both fields available on IMS profile at no extra cost. Easier to extend in future without breaking API contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Prevents callers submitting a url belonging to a different site than the :siteId in the path. Returns PREFLIGHT_INVALID_REQUEST if mismatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Id response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…moved Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iteId Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Guy Pelletier and others added 2 commits May 8, 2026 14:25

docs: add ADR-002 for preflight API REST redesign

fc0506a

Captures the decision to replace /preflight/jobs and /preflight/beta/jobs with site-scoped endpoints under /sites/:siteId/preflights. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: clarify /preflight/jobs deprecated, /preflight/beta/jobs replaced

58cd0e5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier temporarily deployed to dev-branches May 8, 2026 18:42 — with GitHub Actions Inactive

GeezerPelletier requested review from bhellema, clotton and johobot May 11, 2026 12:53

Guy Pelletier and others added 2 commits May 11, 2026 08:54

docs: add description column to proposed endpoints table in ADR-002

156caad

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add explicit request/response schemas to ADR-002

dc39ae4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier marked this pull request as draft May 11, 2026 13:07

Guy Pelletier and others added 4 commits May 11, 2026 09:09

docs: group GET preflights response by URL in ADR-002

2f9a5a0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add data model open question to ADR-002

483ce8a

AsyncJob has no siteId index; GET /sites/:siteId/preflights requires either extending AsyncJob or a new Preflight entity in shared-data-access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: close data model decision in ADR-002 — extend AsyncJob with sit…

315c06a

…eId index AsyncJob TTL (~7 days) makes a purpose-built Preflight entity unnecessary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: clarify siteId is optional on AsyncJob for backward compatibility

0564516

Existing job creation paths must not be affected; new endpoints populate siteId at creation time and query by it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier temporarily deployed to dev-branches May 11, 2026 13:39 — with GitHub Actions Inactive

Guy Pelletier and others added 2 commits May 11, 2026 09:54

docs(adr): add createdBy field to preflight API design

e46ad98

Capture IMS caller identity server-side at job creation for audit purposes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): replace pollUrl body field with Location header in POST re…

9137373

…sponse Per RFC 7231 §6.3.3 the created resource URL belongs in the Location response header, not a custom body field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier temporarily deployed to dev-branches May 11, 2026 14:17 — with GitHub Actions Inactive

Guy Pelletier and others added 3 commits May 11, 2026 10:26

docs(adr): clarify preflight/beta/jobs had no external consumers

2fe6c83

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier marked this pull request as ready for review May 11, 2026 14:42

Merge branch 'main' into sites-44290-preflight-api-rest-redesign

1deff77

GeezerPelletier temporarily deployed to dev-branches May 11, 2026 14:55 — with GitHub Actions Inactive

bhellema reviewed May 11, 2026

View reviewed changes

Guy Pelletier and others added 3 commits May 11, 2026 13:34

docs(adr): add updatedAt to GET all preflights list response

a8fb7fc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): remove step field — Mysticat agent always runs full flow

077c3fd

Confirmed by Ben: identify/suggest are no longer separate steps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): remove mystiqueUrl from public API documentation

fd027b5

Dev-only override, not part of the public contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Guy Pelletier and others added 2 commits May 11, 2026 13:45

docs(adr): add errorCode to error response shape for POST preflights

f091baf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): standardize error codes with PREFLIGHT_ prefix

d5a4298

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier temporarily deployed to dev-branches May 11, 2026 18:02 — with GitHub Actions Inactive

Merge branch 'main' into sites-44290-preflight-api-rest-redesign

1180527

GeezerPelletier temporarily deployed to dev-branches May 12, 2026 13:29 — with GitHub Actions Inactive

johobot reviewed May 12, 2026

View reviewed changes

Guy Pelletier and others added 9 commits May 12, 2026 11:26

docs(adr): expand createdBy to object with id and displayName

dbaff69

Both fields available on IMS profile at no extra cost. Easier to extend in future without breaking API contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): add url hostname validation against site's known URLs

06e6600

Prevents callers submitting a url belonging to a different site than the :siteId in the path. Returns PREFLIGHT_INVALID_REQUEST if mismatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): split 500 into 502 for Mysticat failures and 500 for local…

0f23de6

… errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): cap list endpoint at 50 most recent preflights, defer pagi…

00ac642

…nation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): require siteId ownership check on GET by preflightId

12180f3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): document resultType and resultLocation in GET by preflight…

6ace542

…Id response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): note step branching in links.js and metatags.js will be re…

4829a4c

…moved Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): commit to Sunset date process per RFC 8594

188b7d6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): promote jobType to top-level indexed attribute alongside s…

398ad14

…iteId Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GeezerPelletier temporarily deployed to dev-branches May 12, 2026 16:14 — with GitHub Actions Inactive

Merge branch 'main' into sites-44290-preflight-api-rest-redesign

6733b5b

GeezerPelletier temporarily deployed to dev-branches May 12, 2026 17:07 — with GitHub Actions Inactive

GeezerPelletier requested review from bhellema and johobot May 12, 2026 17:41

		`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.

		`createdBy` is derived server-side from the caller's IMS profile (`authInfo.getProfile().email`) and is never supplied by the client.


		### GET /sites/:siteId/preflights

		Response `200 OK` — grouped by URL, with a nested array of preflights per page. A site can have preflights for multiple URLs:

		\| `preflights[].createdBy` \| string \| IMS email of the user who triggered the preflight \|

		---


		The decision is to extend `AsyncJob` in `spacecat-shared-data-access`:

		- Add `siteId` as an optional top-level indexed attribute on the `AsyncJob` schema

Conversation

GeezerPelletier commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's being proposed

POST /sites/:siteId/preflights

GET /sites/:siteId/preflights

GET /sites/:siteId/preflights/:preflightId

Data model

Current endpoint fate

Test plan

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

codecov Bot commented May 8, 2026

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhellema May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhellema May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

GeezerPelletier commented May 8, 2026 •

edited

Loading

bhellema May 11, 2026 •

edited

Loading

bhellema May 11, 2026 •

edited

Loading