fix(hydration): make chart-repo sync-app timeout configurable, default 30s#124
Open
aamsellem wants to merge 1 commit intobeclab:mainfrom
Open
fix(hydration): make chart-repo sync-app timeout configurable, default 30s#124aamsellem wants to merge 1 commit intobeclab:mainfrom
aamsellem wants to merge 1 commit intobeclab:mainfrom
Conversation
…t 30s
The TaskForApiStep client had a hardcoded 3 second timeout on the call to
chart-repo's /chart-repo/api/v2/dcr/sync-app endpoint. This endpoint runs
the full hydration pipeline synchronously (download chart → render →
image analysis → database update). On first-time installs of a new chart
the call typically takes ~4 seconds end-to-end — even more when chart-repo
is concurrently processing other sources in the background.
Symptoms observed in the field:
- market logs: "TaskForApiStep - Request failed in 3.000xxx s ...
context deadline exceeded"
- chart-repo logs for the same task: hydration steps 1/5 → 5/5 all
completed successfully a moment later
- the task is then moved to the render-failed list and the app silently
disappears from the user's market until the cache is purged manually
This commit:
- bumps the default timeout from 3s to 30s
- allows operators to override via MARKET_CHART_REPO_TIMEOUT_SECONDS
- logs a warning if the env var value is invalid
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
TaskForApiStepHTTP client had a hardcoded 3 second timeout on its call to chart-repo's/chart-repo/api/v2/dcr/sync-appendpoint. That endpoint runs the full hydration pipeline synchronously (download chart → render → image analysis → DB update). On a first-time install of a new chart the call typically takes ~4 seconds end-to-end — and longer when chart-repo is concurrently processing other sources in the background.This causes new charts (especially from third-party market sources) to silently fail to install, even though chart-repo itself completes the work successfully a moment after the market client gives up.
Symptoms (observed in production)
market pod:
```
TaskForApiStep - Request failed in 3.000439s for user=…, source=…, app=14fdab3c:
Post "http://chart-repo-service:82/chart-repo/api/v2/dcr/sync-app\":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Step 1 (TaskForApiStep) failed for task: …, moving to render failed list
Removed app 14fdab3c from pending list for user: …, source: …
```
chart-repo pod (same task, ~1 second later):
```
HYDRATION STEP 5/5 COMPLETED: Database and Cache Update
HYDRATION TASK COMPLETED
App 14fdab3c rendering completed successfully
```
The result: hydration succeeded server-side but the orchestrator dropped the app, and the user has to wait for cache cleanup or manually purge Redis to retry.
Changes
3 * time.Secondto30 * time.SecondMARKET_CHART_REPO_TIMEOUT_SECONDSenvironment variableWhy 30s
Empirically, chart-repo
sync-appreturns within 4–6 seconds for new charts on a typical Olares device. 30 seconds gives a comfortable margin for cold starts, slow disks, or transient contention with the background hydration of large official catalogs (~200 apps). It is still short enough to surface a real failure quickly.Test plan
TaskForApiStepwithout hitting the deadline.MARKET_CHART_REPO_TIMEOUT_SECONDS=60and confirm the configured value is used.MARKET_CHART_REPO_TIMEOUT_SECONDS=abcand confirm a warning is logged and the default (30s) is used.