Skip to content

fix: return 404 for stale cache entries with missing storage objects#223

Merged
LouisHaftmann merged 1 commit intofalcondev-oss:devfrom
mpecan:dev
Apr 14, 2026
Merged

fix: return 404 for stale cache entries with missing storage objects#223
LouisHaftmann merged 1 commit intofalcondev-oss:devfrom
mpecan:dev

Conversation

@mpecan
Copy link
Copy Markdown
Contributor

@mpecan mpecan commented Apr 13, 2026

Closes #222

Problem

When a cache entry exists in the DB but its backing storage object has been lost (interrupted upload, manual deletion, lifecycle rule, partial bucket wipe), GET /download/:cacheEntryId threw an unhandled error and returned 500. BuildKit's cache-import client retries on 500 and hangs the workflow instead of falling back to the upstream registry.

Solution

Return 404 when the storage object is missing so clients treat it as a clean cache miss.

The download path has three branches based on storage_locations.mergedAt / mergeStartedAt:

  • Merged blob pathadapter.createDownloadStream(merged) already awaits a synchronous existence check in each adapter, so errors surface before any response bytes are written. The outer try/catch just needs to recognize them.
  • Parts-streaming paths (fresh entry and mergeStartedAt-but-not-merged) — these were lazy: the generator opened parts only when the consumer began reading, so a missing object would surface after the HTTP response was already framed as 200, truncating the client. Now pre-validated via countFilesInFolder before returning the response stream.

When a stale entry is detected, the server logs a warning and returns undefined; the route yields 404; the existing cleanup task reaps the row on its normal schedule. No DB mutation on the read path.

Changes

  • New typed ObjectNotFoundError raised by each adapter's createDownloadStream when the object is missing.
  • New private Storage.ensurePartsExist(location) — uses countFilesInFolder to pre-probe before committing to an HTTP response.
  • Storage.download() wraps the body in a single try/catch that converts ObjectNotFoundError into undefined (→ 404) and rethrows everything else.
  • FileSystemAdapter.countFilesInFolder now returns 0 for a missing folder instead of throwing ENOENT, matching S3/GCS list-by-prefix semantics.
  • Removed a dead inner try/catch around the merge-upload promise setup (uploadStream is async, so a sync throw isn't possible).

Testing

  • pnpm run type-check
  • pnpm run lint
  • pnpm run test:run ✓ (new tests pass under filesystem + sqlite locally; full 3×3 matrix runs in CI)

New tests/stale-cache.test.ts covers both parts-streaming paths by wiping storage through Storage.getAdapterFromEnv() so it runs under all three storage drivers in CI:

  1. Unmerged entry — save → wipe → restore twice → expect cache miss (exercises `ensurePartsExist` with zero parts).
  2. Merged entry — save → restore (triggers background merge) → wait for merge → wipe → restore twice → expect cache miss (exercises the synchronous existence check on the merged blob).

Representative server log from a run confirming the fix:

```
[warn] Stale cache entry 4f31d42e-...: Object not found in storage: 1550524356/merged
[error] Response: GET /download/4f31d42e-... > 404
```

Notes

  • No new dependencies.
  • No schema changes.
  • No changes to any public HTTP contract — stale entries now return 404 where they previously 500'd.

When a cache entry exists in the database but the underlying storage
object has been lost (incomplete upload, manual deletion, lifecycle
policy), the download route previously threw and returned 500, causing
BuildKit clients to hang instead of falling back to the upstream
registry.

Storage.download() now pre-validates part presence via
countFilesInFolder before committing to an HTTP response, and adapters
throw a typed ObjectNotFoundError when the merged blob is missing.
Both are caught in the outer try/catch, which logs a warning and
returns undefined so the route returns 404. Stale rows are left for
the existing cleanup task to reap, avoiding mutating DB state on a
read path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LouisHaftmann LouisHaftmann merged commit fc97bc0 into falcondev-oss:dev Apr 14, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Download of cache entry with missing storage object returns 500 and hangs BuildKit clients

2 participants