fix: reliably deliver CDN opt-in notification email#2340
Open
khushboo5723 wants to merge 13 commits into
Open
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
This PR will trigger a patch release when merged. |
…ervice into make-email-sync
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Awaiting the opt-in notification work (IMS lookup + S3 read + email send) on the
request path so the email reliably goes out, while parallelising the prep calls
to keep the added latency at ~500ms. This is acceptable as a short-term fix
because opting in is a one-time user action, not a hot path.
Follow-up: move the notification to an SQS-driven flow so the API call doesn't
have to wait at all. Tracked in LLMO-4814.
Issue
The opt-in email logic was wrapped in a detached async IIFE (fire-and-forget
promise) that the Lambda handler never awaited. The original assumption was
that
callbackWaitsForEmptyEventLoop=truewould keep the Lambda alive longenough for the IIFE to finish.
That assumption only holds for legacy callback-style Lambda handlers.
This service uses an async handler, and with async handlers Lambda freezes
the container the moment the handler's returned promise resolves — abandoning
any detached promises mid-execution.
Evidence in Coralogix:
[edge-optimize-config] Slack notification sent— sync path log (worked correctly)[cdn-opt-in-notification]logs in the same invocation — the async IIFE never ranFix
if (isNewlyOpted)block insteadof detaching it. This guarantees the Lambda doesn't freeze before the email
is sent.
LLMO-config read — so they overlap with
saveSiteConfiginstead of runningserially after it. Total added latency drops from ~5–10s to ~500ms.
step=…timing logs (ims-resolved,s3-resolved,save-config-done,slack-done,email-done,complete) so we can keepverifying latency in Coralogix.
Trade-off: the customer clicking "enable" now waits ~500ms longer for the API
response. Acceptable because:
Next step
Move the notification work onto an SQS queue so the API handler enqueues a
message and returns immediately. The worker consumes the message and runs
IMS + S3 + send-email asynchronously. Tracked in LLMO-4814.