Skip to content

docs: ADR-002 preflight API REST redesign#2377

Open
GeezerPelletier wants to merge 30 commits into
mainfrom
sites-44290-preflight-api-rest-redesign
Open

docs: ADR-002 preflight API REST redesign#2377
GeezerPelletier wants to merge 30 commits into
mainfrom
sites-44290-preflight-api-rest-redesign

Conversation

@GeezerPelletier
Copy link
Copy Markdown
Contributor

@GeezerPelletier GeezerPelletier commented May 8, 2026

Summary

  • Adds docs/decisions/002-preflight-api-rest-redesign.md capturing the decision to replace /preflight/beta/jobs with site-scoped REST endpoints under /sites/:siteId/preflights, and to deprecate both legacy endpoint pairs
  • No code changes - this is a decision record for team review before implementation begins

Related: SITES-44290 | Bulk endpoint deferred to: SITES-44432

What's being proposed

POST /sites/:siteId/preflights

Request body:

{
  "url": "https://main--site--org.hlx.page/some-path"
}

promiseToken passed via cookie for authenticated CMS pages. createdBy is derived server-side from the caller's IMS profile and never supplied by the client. step is removed - Mysticat always runs the full identify/suggest flow. The submitted url must match one of the site's known hostnames (base URL, preview URL, or live URL) - a mismatch returns PREFLIGHT_INVALID_REQUEST.

Response 202 Accepted:

Headers:

Location: https://spacecat.experiencecloud.live/api/v1/sites/{siteId}/preflights/{preflightId}

Body:

{
  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "IN_PROGRESS",
  "createdAt": "2026-05-11T10:00:00.000Z",
  "createdBy": {
    "id": "ABC123@techacct.adobe.com",
    "displayName": "John Doe"
  }
}

Error responses:

Status errorCode Condition
400 Bad Request PREFLIGHT_INVALID_REQUEST url is missing, not a valid URI, or does not belong to the site identified by :siteId
403 Forbidden PREFLIGHT_ACCESS_DENIED Caller does not have access to the site
403 Forbidden PREFLIGHT_NOT_ENABLED Preflight is not enabled for the site
404 Not Found PREFLIGHT_SITE_NOT_FOUND siteId does not exist
502 Bad Gateway PREFLIGHT_UPSTREAM_ERROR Mysticat returned a 5xx response
500 Internal Server Error PREFLIGHT_INTERNAL_ERROR Unexpected error within this service

Error response body:

{
  "errorCode": "PREFLIGHT_NOT_ENABLED",
  "message": "Preflight is not enabled for this site"
}

No job record is created for 400, 403, or 404 responses.

GET /sites/:siteId/preflights

Response 200 OK - grouped by URL, capped at 50 most recent preflights per site (sorted by createdAt desc across all URLs before grouping):

[
  {
    "url": "https://main--site--org.hlx.page/some-path",
    "preflights": [
      {
        "preflightId": "3fa85f64-...",
        "status": "COMPLETED",
        "createdAt": "2026-05-11T10:00:00.000Z",
        "updatedAt": "2026-05-11T10:00:05.000Z",
        "createdBy": { "id": "ABC123@techacct.adobe.com", "displayName": "John Doe" }
      }
    ]
  }
]

GET /sites/:siteId/preflights/:preflightId

Response 200 OK - full detail. The handler verifies the loaded job's siteId matches the path's :siteId, returning 404 on mismatch:

{
  "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "COMPLETED",
  "url": "https://main--site--org.hlx.page/some-path",
  "createdAt": "2026-05-11T10:00:00.000Z",
  "createdBy": { "id": "ABC123@techacct.adobe.com", "displayName": "John Doe" },
  "updatedAt": "2026-05-11T10:00:05.000Z",
  "startedAt": "2026-05-11T10:00:01.000Z",
  "endedAt": "2026-05-11T10:00:05.000Z",
  "result": {},
  "resultType": "INLINE",
  "resultLocation": null,
  "error": null
}

When resultType is S3 or URL, result will be null and the client should fetch from resultLocation.

Data model

AsyncJob will be extended in spacecat-shared-data-access with:

  • siteId as an optional top-level indexed attribute
  • jobType as an optional top-level indexed attribute
  • allBySiteIdAndJobType(siteId, jobType) collection method

Both attributes are optional so that existing job creation paths (including the deprecated /preflight/jobs queue-based flow) continue to work unchanged. url grouping on the list endpoint is an in-memory groupBy on metadata.url after the indexed query - acceptable given the 50-record cap.

createdBy is stored as { id, displayName } in job metadata at creation time - id is profile.email (the IMS user ID) and displayName is composed from profile.first_name + last_name. Both fields are available on the authenticated profile; no additional IMS lookup required. Surfaces in all three endpoint responses for audit purposes.

This is a prerequisite: the spacecat-shared-data-access change must land before controller work begins.

Current endpoint fate

Both legacy endpoint pairs are deprecated and remain functional in parallel with the new endpoints. The Sunset date will be set by PM at the time this ADR moves to Accepted, with a minimum of 90 days from MFE migration start.

Endpoint Action
POST /preflight/beta/jobs Deprecated - internal use only; no external consumers
GET /preflight/beta/jobs/:jobId Deprecated - internal use only; no external consumers
POST /preflight/jobs Deprecated - queue-based; migration timeline to be coordinated with consumers
GET /preflight/jobs/:jobId Deprecated - queue-based; migration timeline to be coordinated with consumers

Test plan

  • Read the ADR and validate the design rationale
  • Confirm the proposed endpoint shape matches team expectations before implementation starts

?? Generated with Claude Code

Guy Pelletier and others added 2 commits May 8, 2026 14:25
Captures the decision to replace /preflight/jobs and /preflight/beta/jobs
with site-scoped endpoints under /sites/:siteId/preflights.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

This PR will trigger no release when merged.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Guy Pelletier and others added 2 commits May 11, 2026 08:54
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier GeezerPelletier marked this pull request as draft May 11, 2026 13:07
Guy Pelletier and others added 4 commits May 11, 2026 09:09
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AsyncJob has no siteId index; GET /sites/:siteId/preflights requires
either extending AsyncJob or a new Preflight entity in shared-data-access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eId index

AsyncJob TTL (~7 days) makes a purpose-built Preflight entity unnecessary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Existing job creation paths must not be affected; new endpoints
populate siteId at creation time and query by it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guy Pelletier and others added 2 commits May 11, 2026 09:54
Capture IMS caller identity server-side at job creation for audit purposes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sponse

Per RFC 7231 §6.3.3 the created resource URL belongs in the Location
response header, not a custom body field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guy Pelletier and others added 3 commits May 11, 2026 10:26
- Add error response table to POST endpoint (400/403/404/500)
- Disabled-site case returns 403 immediately; no job record created
- Previous CANCELLED-job behavior on disabled preflight is explicitly removed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both /preflight/beta/jobs and /preflight/jobs are deprecated and run in
parallel with the new endpoints until consumers migrate and a deletion
milestone is agreed upon.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier GeezerPelletier marked this pull request as ready for review May 11, 2026 14:42
Body:
```json
{
"preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not just be id ? Is this not the id of the aysnc job?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rename is intentional — jobId leaks the AsyncJob backing model to consumers, which is an implementation detail they shouldn't need to know about. preflightId is the domain name for this resource. Internally it maps 1:1 to the AsyncJob ID.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, preflightId is fine, I figured that id would be enough. The response is a preflight response and here's the id.

{
"preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"status": "IN_PROGRESS",
"createdAt": "2026-05-11T10:00:00.000Z",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include last updatedAt as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — updatedAt is already on the GET by ID response. Worth adding to the list items too so clients can detect state changes without fetching each one individually.

**Request body** (`application/json`):
```json
{
"url": "https://main--site--org.hlx.page/some-path",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today you can pass a list of urls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that's an intentional change — one URL per request, one preflightId per URL. Each preflight is an independently pollable resource. If the MFE needs bulk in future, clients fire parallel requests. This avoids partial-failure complexity and keeps each job clean. See the key changes section for the full rationale.

Copy link
Copy Markdown
Contributor

@bhellema bhellema May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client's could select hundreds of pages. This is too much load to manage. Use case, customer is ready to run a promotion and they have many different areas of the site they want to run preflight on. Allowing them to send 1 job with many urls makes sense. It could take a while, but running in bulk this way would be expected to be slower.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as above — see the thread on line 44 for the full proposal.

```json
{
"url": "https://main--site--org.hlx.page/some-path",
"step": "identify",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of step should be going away as we always will perform identify/suggest as part of the mysticat agent step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and worth nailing down now. If the Mysticat agent always runs both identify and suggest as a single flow, step is redundant and we should drop it from the new endpoint entirely rather than carry it forward and deprecate it later. Can you confirm that's the direction? If so we'll remove it from the design.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the direction that we want to go.

{
"url": "https://main--site--org.hlx.page/some-path",
"step": "identify",
"mystiqueUrl": "optional-ephemeral-host.stage.cloud.adobe.io"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value of sending this back to the client.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify — mystiqueUrl is a request-only field, never returned in any response. It lets developers route a preflight to an ephemeral Mysticat instance for testing without hitting prod, and is blocked in non-dev environments. If you're questioning whether it belongs in the API at all that's a fair debate — it could move to an env-level config instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it from the API response.

|-------|------|----------|-------------|
| `url` | string (URI) | Yes | The single page URL to analyze |
| `step` | enum: `identify` \| `suggest` | Yes | Audit step to perform |
| `mystiqueUrl` | string | No | Dev-only override for the Mysticat service URL |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url i don't think needs to be exposed here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url field is required in the request body because Mysticat needs it to scrape the specific page via DRS headless browser — siteId alone isn't enough since a site has many pages and preflight is per-page. As for the response, it is useful in GET by ID so the consumer knows what was analyzed. Could reasonably be omitted from the 202 POST response body given the Location header already points to the resource.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this should be exposed as part of the public api.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the misread — we interpreted this as being about the url field rather than mystiqueUrl. Agreed, mystiqueUrl is removed from the public documentation. It stays in the implementation as an undocumented dev override for testing against ephemeral Mysticat instances, just not part of the public API contract.

| `step` | enum: `identify` \| `suggest` | Yes | Audit step to perform |
| `mystiqueUrl` | string | No | Dev-only override for the Mysticat service URL |

`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fix that should be set via the header not cookie.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed a header would be cleaner. The cookie approach is inherited from the /auth/promise browser flow where it's set automatically. Switching to a header would require the MFE to read and forward it explicitly — worth a separate cleanup ticket but out of scope for this redesign.

`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.

`createdBy` is derived server-side from the caller's IMS profile (`authInfo.getProfile().email`) and is never supplied by the client.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createdAt - would be good to report too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createdAt is already in the 202 response body — you may have been looking at an earlier version of the doc before we expanded the response shapes.

| `500 Internal Server Error` | Mysticat call failed or unexpected error |

No job record is created for `400`, `403`, or `404` responses. The current `/preflight/beta/jobs`
behavior of creating a job and immediately setting it to `CANCELLED` when preflight is disabled
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancelled is not the correct status here it feels like cancelled is a user action. Maybe something like ABORTED.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the semantics, but we've resolved this differently — the new design returns 403 Forbidden immediately when preflight is disabled, without creating a job record at all. No phantom CANCELLED or ABORTED record is created. See the error responses table in the POST section.

Copy link
Copy Markdown
Contributor

@bhellema bhellema May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like it. However as part of the 403, what does the response body look like to allow the UI to provide a meaningful message to the user as if they do not have access to the site, or that the audit was disabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. The error body includes an errorCode for machine-readable handling and a message as a human-readable hint:

{
  "errorCode": "PREFLIGHT_NOT_ENABLED",
  "message": "Preflight is not enabled for this site"
}

All codes are PREFLIGHT_-prefixed for consistency: PREFLIGHT_ACCESS_DENIED, PREFLIGHT_NOT_ENABLED, PREFLIGHT_INVALID_REQUEST, PREFLIGHT_SITE_NOT_FOUND, PREFLIGHT_INTERNAL_ERROR. Consumers translate errorCode to their own messaging — message is informational only. ADR updated.

Open to other suggestions on the error code style if there's a preferred convention.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy that at least there's an error code! Using a numeric value just requires another lookup, I don't mind the string.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — string codes are self-documenting and don't require a lookup table. Happy we're aligned.

Comment on lines +116 to +134
{
"url": "https://main--site--org.hlx.page/some-path",
"preflights": [
{
"preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"status": "COMPLETED",
"step": "identify",
"createdAt": "2026-05-11T10:00:00.000Z",
"createdBy": "user@example.com"
},
{
"preflightId": "7c9b1e32-1234-4abc-b3fc-9f8a7c6d5e4f",
"status": "COMPLETED",
"step": "suggest",
"createdAt": "2026-05-11T10:05:00.000Z",
"createdBy": "user@example.com"
}
]
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure is not what I would expect, but rather a list of jobs with some minor details:

[
  {
      "preflightId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "status": "COMPLETED",
      "urlCount": 12,   <----- maybe show the number of urls they ran with
      "opportunities": 5,  <------ the number of opportunities that were created
      "createdAt": "2026-05-11T10:00:00.000Z",
      "createdBy": "user@example.com"
  }
]

Listing the preflight jobs should be lightweight and fast. Details of the jobs can be fetched with a follow up with the id of the preflight job.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flat list is simpler and we can drop the URL grouping — agreed. The opportunities count is an interesting addition; does Mysticat write that back to the job result today? If so it's worth surfacing in the list.

One question on urlCount: that field implies a job can span multiple URLs, but the new design is one URL per preflight. With that model urlCount would always be 1. Was the intent to keep multi-URL jobs, or is urlCount a carry-over from the old design?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must consider the ability for multiple urls. If we get to the point of providing a different experience where they can select many pages and initiate a preflight request, we need a way to make this experience simply for the client, being a headless agent or client ui.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the thread on line 44 — the proposal is to keep this endpoint single-URL for now since it was never actually used with multiple URLs in practice, and add a dedicated POST /sites/:siteId/preflights/bulk endpoint when that use case is confirmed. Adding it later is cheap and keeps the design clean in the meantime.

Copy link
Copy Markdown
Contributor

@bhellema bhellema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to consider what the bulk use case looks like.

Guy Pelletier and others added 3 commits May 11, 2026 13:34
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Confirmed by Ben: identify/suggest are no longer separate steps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dev-only override, not part of the public contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guy Pelletier and others added 2 commits May 11, 2026 13:45
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier
Copy link
Copy Markdown
Contributor Author

@bhellema — created SITES-44432 to track the bulk endpoint use case (POST /sites/:siteId/preflights/bulk). It's linked to SITES-44290 as a follow-on. We can design it properly once the single-URL endpoint ships and the bulk use case is confirmed from the MFE side.


`promiseToken` is passed via cookie for authenticated CMS pages (CS/CS_CW/AMS sites); it is not part of the request body.

`createdBy` is derived server-side from the caller's IMS profile (`authInfo.getProfile().email`) and is never supplied by the client.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting: in the current codebase profile.email holds the IMS user ID string, not necessarily a human-readable email address (see comment in api-key.js). The pattern used by other controllers is authInfo.getProfile().email || 'system'. The rendered createdBy value in the MFE may look like an opaque ID rather than an address — worth aligning on so implementers and the MFE team have the same expectation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the IMS profile ambiguity. We've expanded createdBy from a plain string to an object:

"createdBy": {
  "id": "ABC123@techacct.adobe.com",
  "displayName": "John Doe"
}

id is profile.email (the IMS user ID — the naming is misleading but consistent with the rest of the codebase, as noted in api-key.js). displayName is composed from profile.first_name + last_name, falling back to profile.name. No extra IMS lookup needed — both fields are on the authenticated profile already. This shape is easy to extend with additional identity fields in future without breaking the API contract. ADR updated.


| Status | `errorCode` | Condition |
|--------|-------------|-----------|
| `400 Bad Request` | `PREFLIGHT_INVALID_REQUEST` | `url` is missing or invalid |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two gaps worth closing here:

  1. "Invalid" is not defined. Does the URL need to match a known Helix/AEM hostname pattern? The current implementation validates against site preview/live/CDN URLs via findByPreviewURL. Is that check being preserved, or is URL-to-site resolution truly gone?

  2. If it's gone, there's nothing preventing a caller from submitting a url that belongs to a different site than :siteId. That mismatches context for link normalization and auth token resolution in the audit worker, which uses site.getBaseURL(). Recommend keeping hostname validation and returning a distinct error code if the URL doesn't match the site.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both points addressed. The controller will validate that the url hostname matches one of the site's known hostnames (base URL, preview URL, or live URL) after loading the site via Site.findById(siteId). A URL that is structurally valid but belongs to a different site returns PREFLIGHT_INVALID_REQUEST — so the cross-site submission risk is closed. The error table and field description in the ADR are updated to make "invalid" explicit. This replaces the implicit validation that findByPreviewURL previously provided as a side effect.

| `403 Forbidden` | `PREFLIGHT_ACCESS_DENIED` | Caller does not have access to the site |
| `403 Forbidden` | `PREFLIGHT_NOT_ENABLED` | Preflight is not enabled for the site |
| `404 Not Found` | `PREFLIGHT_SITE_NOT_FOUND` | `siteId` does not exist |
| `500 Internal Server Error` | `PREFLIGHT_INTERNAL_ERROR` | Mysticat call failed or unexpected error |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Mysticat itself returns 5xx, 502 Bad Gateway (or 503) is more accurate than 500 — it signals the fault is downstream, not in this service. That distinction matters for client retry logic and on-call triage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Splitting into 502 Bad Gateway / PREFLIGHT_UPSTREAM_ERROR for Mysticat failures and 500 Internal Server Error / PREFLIGHT_INTERNAL_ERROR for unexpected local errors. Updated the ADR.


### GET /sites/:siteId/preflights

**Response** `200 OK` — grouped by URL, with a nested array of preflights per page. A site can have preflights for multiple URLs:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns with this shape:

  1. Every other list endpoint in the service returns a flat array — this URL-grouped structure is a one-off that requires in-memory groupBy in the controller and can't go through the standard DTO path.
  2. No pagination is specified. An active site accumulates 300+ records per 7-day TTL window. The service uses cursor-based limit/cursor pagination elsewhere. Even a default cap needs to be defined here, since adding nextCursor to a bare top-level array later is a breaking change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On (1): agreed it's a one-off shape, but the MFE benefit of seeing history grouped by URL outweighs the controller complexity for now - the groupBy is a straightforward in-memory step after a single allBySiteIdAndJobType query.

On (2): adding a default cap of 50 most recent preflights per site (sorted by createdAt desc across all URLs before grouping). Given the 7-day AsyncJob TTL the natural upper bound is low in practice. Full cursor-based pagination is deferred - updated the ADR.

| `preflights[].createdBy` | string | IMS email of the user who triggered the preflight |

---

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR doesn't specify that this handler validates preflightId belongs to the path's :siteId. The current getPreflightJobStatusAndResult loads by preflightId alone with no site ownership check. If that carries forward, a caller can read any preflight result — including scraped draft CMS content — by substituting a different siteId in the path. The handler should load by (siteId, preflightId) composite and return 404 if either mismatches, never by ID alone.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good security catch. Adding an explicit ownership check to the ADR: the handler loads by preflightId then verifies the stored siteId matches the path's :siteId, returning 404 on mismatch - same response as a non-existent preflightId - so callers cannot confirm a preflight exists by probing with a different site path. Updated the ADR.

"createdBy": "user@example.com",
"updatedAt": "2026-05-11T10:00:05.000Z",
"startedAt": "2026-05-11T10:00:01.000Z",
"endedAt": "2026-05-11T10:00:05.000Z",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncJob already has resultLocation (S3 URI) and resultType (INLINE/S3/URL) for handling large payloads. In practice, preflight results can be substantial and are often written to S3, leaving result: null in the DB record. Clients polling this endpoint need to know when to follow resultLocation instead. The response table should document both fields and the decision rule.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - adding resultType and resultLocation to the GET by ID response table with the decision rule: when resultType is S3 or URL, result will be null and the client should fetch from resultLocation; when resultType is INLINE, result contains the payload directly and resultLocation is null. Updated the ADR.

- **`jobId` renamed to `preflightId`** in all responses.
- **`pollUrl` removed from the response body.** Per RFC 7231 §6.3.3, the URL of the created
resource is communicated via the `Location` response header. Clients that need to poll for
completion read `Location` rather than a body field.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker still branches on step in two places:

  • src/preflight/links.js:102if (step === 'suggest') gates the AI suggestion call
  • src/preflight/metatags.js:103if (step === 'suggest') gates metatagsAutoSuggest

Dropping step from the new endpoint without a coordinated worker change means callers silently get identify-only results (the worker defaults to PREFLIGHT_STEP_IDENTIFY when the field is absent). If Mysticat has already been updated to always run both phases unconditionally, this ADR should reference that change. If not, it's a prerequisite alongside the spacecat-shared-data-access change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed - Mysticat has no concept of step; it always runs both identify and suggest in a single flow. The branching in links.js and metatags.js is dead code that will be removed as part of this implementation. No separate prerequisite needed. Updated the ADR.


**Both existing endpoint pairs are deprecated**, not removed. They will remain functional in
parallel with the new endpoints until the MFE has migrated and a deletion milestone is agreed
upon. Deprecation notices should be added to their OpenAPI spec entries and response headers
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sunset: <date-tbd> doesn't satisfy RFC 8594, which requires a concrete date. In practice this tends to become permanent dual maintenance. Recommend committing to a process here: e.g. "date set by PM at the time this ADR moves to Accepted, minimum 90 days from MFE migration start."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - adopting your suggested process: the Sunset date will be set by PM at the time this ADR moves to Accepted, with a minimum of 90 days from MFE migration start. Updated the ADR.


The decision is to **extend `AsyncJob` in `spacecat-shared-data-access`**:

- Add `siteId` as an **optional** top-level indexed attribute on the `AsyncJob` schema
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobType is currently stored inside metadata.jobType (e.g. metadata: { jobType: 'preflight' }), not as a top-level attribute. allBySiteIdAndJobType would therefore need to either (a) promote jobType to a top-level indexed column — another schema change and DB migration — or (b) fetch all jobs by siteId index and filter in-memory on metadata.jobType. Option (b) is fine at low volume but the choice should be explicit. The same applies to url, which also lives in metadata and is used for grouping on the list endpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good clarification. The ADR already proposes promoting siteId to a top-level indexed column - extending that to also promote jobType as a top-level indexed attribute in the same spacecat-shared-data-access change is the right call, so allBySiteIdAndJobType queries both indexes without touching metadata. The url grouping for the list endpoint remains an in-memory groupBy on metadata.url after the indexed query - acceptable given the 50-record cap. Updated the ADR.

Guy Pelletier and others added 9 commits May 12, 2026 11:26
Both fields available on IMS profile at no extra cost.
Easier to extend in future without breaking API contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents callers submitting a url belonging to a different site than
the :siteId in the path. Returns PREFLIGHT_INVALID_REQUEST if mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Id response

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…moved

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iteId

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants