Skip to content

fix: reliably deliver CDN opt-in notification email#2340

Open
khushboo5723 wants to merge 13 commits into
mainfrom
make-email-sync
Open

fix: reliably deliver CDN opt-in notification email#2340
khushboo5723 wants to merge 13 commits into
mainfrom
make-email-sync

Conversation

@khushboo5723
Copy link
Copy Markdown
Contributor

@khushboo5723 khushboo5723 commented May 6, 2026

Summary

Awaiting the opt-in notification work (IMS lookup + S3 read + email send) on the
request path so the email reliably goes out, while parallelising the prep calls
to keep the added latency at ~500ms. This is acceptable as a short-term fix
because opting in is a one-time user action, not a hot path.

Follow-up: move the notification to an SQS-driven flow so the API call doesn't
have to wait at all. Tracked in LLMO-4814.

Issue

The opt-in email logic was wrapped in a detached async IIFE (fire-and-forget
promise) that the Lambda handler never awaited. The original assumption was
that callbackWaitsForEmptyEventLoop=true would keep the Lambda alive long
enough for the IIFE to finish.

That assumption only holds for legacy callback-style Lambda handlers.
This service uses an async handler, and with async handlers Lambda freezes
the container the moment the handler's returned promise resolves — abandoning
any detached promises mid-execution.

Evidence in Coralogix:

  • [edge-optimize-config] Slack notification sentsync path log (worked correctly)
  • Zero [cdn-opt-in-notification] logs in the same invocation — the async IIFE never ran

Fix

  1. Await the notification flow inside the if (isNewlyOpted) block instead
    of detaching it. This guarantees the Lambda doesn't freeze before the email
    is sent.
  2. Parallelise the two independent prep calls — IMS profile lookup and S3
    LLMO-config read — so they overlap with saveSiteConfig instead of running
    serially after it. Total added latency drops from ~5–10s to ~500ms.
  3. Added structured step=… timing logs (ims-resolved, s3-resolved,
    save-config-done, slack-done, email-done, complete) so we can keep
    verifying latency in Coralogix.

Trade-off: the customer clicking "enable" now waits ~500ms longer for the API
response. Acceptable because:

  • opt-in is a one-time per-site action, not a frequent API call
  • the alternative (lost emails) blocks downstream LLMO team workflow

Next step

Move the notification work onto an SQS queue so the API handler enqueues a
message and returns immediately. The worker consumes the message and runs
IMS + S3 + send-email asynchronously. Tracked in LLMO-4814.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

This PR will trigger a patch release when merged.

@khushboo5723 khushboo5723 changed the title fix: adding logs to check the latency fix: making this email call sync for now May 10, 2026
@khushboo5723 khushboo5723 requested a review from dipratap May 10, 2026 22:37
@khushboo5723 khushboo5723 changed the title fix: making this email call sync for now fix: reliably deliver CDN opt-in notification email May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants