fix(admin): keep Cumulative Users scan alive on Cloud Run#6679
fix(admin): keep Cumulative Users scan alive on Cloud Run#6679
Conversation
The Cumulative Users chart was showing "no data available" on production because the daily-new-users route was OOM-killing the Cloud Run container (SIGABRT / signal 6) while iterating through ~112K Firebase Auth users. 512Mi wasn't enough headroom for Next.js plus the listUsers() cursor, and the process died mid-response. Three fixes: - Persist the computed daily series to Redis under a 30 minute TTL so only one instance ever pays the full scan cost per window — subsequent requests (including cold starts on other instances) read the cached series instantly. - Yield to the event loop between listUsers() pages so V8 can collect the previous batch of UserRecord objects before the next one arrives, keeping peak memory flat across the scan. - Bump the Cloud Run revision to --memory=1Gi --cpu=1 in the deploy workflow as a safety margin. The live service was already hot-patched to 1Gi so production stays up before this workflow runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR fixes a production OOM crash (SIGABRT/signal-6) in the Confidence Score: 5/5Safe to merge — fixes a confirmed production OOM with no correctness, security, or data-integrity concerns. All three layers of the fix are correct: the in-memory/Redis/rebuild ordering is intentional and handles cross-instance cache population, the setImmediate yield correctly allows V8 to GC each page batch, and the Redis helpers are fail-open matching the existing codebase pattern. Redis credentials were already present in the deployment secrets. No P0/P1 findings. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Route as route.ts (getSeries)
participant Mem as In-Memory Cache
participant Redis as Redis (30 min TTL)
participant FB as Firebase Auth
Client->>Route: GET /api/omi/stats/daily-new-users
Route->>Mem: cachedSeries fresh?
alt In-memory hit (< 30 min)
Mem-->>Route: CachedSeries
Route-->>Client: JSON response
else In-memory miss/stale
Route->>Redis: getJsonCache(REDIS_KEY)
alt Redis hit (generatedAt < 30 min)
Redis-->>Route: CachedSeries
Route->>Mem: update cachedSeries
Route-->>Client: JSON response
else Redis miss/stale
alt pendingBuild running?
Route-->>Route: await existing pendingBuild
else No pending build
Route->>FB: listUsers(1000, pageToken) x N pages
Note over Route,FB: setImmediate yield between pages (GC)
FB-->>Route: UserRecord batches
Route->>Redis: setJsonCache(series, 30 min TTL)
Route->>Mem: update cachedSeries
Route-->>Client: JSON response
end
end
end
Reviews (1): Last reviewed commit: "fix(admin): keep Cumulative Users scan a..." | Re-trigger Greptile |
Summary
Cumulative Users chart was showing "no data available" on production. Cloud Run logs showed repeated
Uncaught signal: 6, pid=1(SIGABRT) + "HTTP response was malformed" around every/api/omi/stats/daily-new-usershit — the Node process was OOM-killed while iterating ~112K Firebase Auth users on a 512Mi container.Fixes
setImmediateso V8 can collect the previous batch of UserRecord objects before the nextlistUsers()call arrives, keeping peak memory flat across the scan.--memory=1Gi --cpu=1ingcp_admin.yml. The live revision was already hot-patched viagcloud run services updateso production is already back up before this PR merges.Test plan
gcp_admin.ymldeploy uses--memory=1Gi/dashboard/analyticsin production, Cumulative Users chart renders with all three filter windowsgcloud logging read ... severity>=ERRORshows no signal-6 crashes after deploy🤖 Generated with Claude Code