Skip to content

fix(hydration): make chart-repo sync-app timeout configurable, default 30s#124

Open
aamsellem wants to merge 1 commit intobeclab:mainfrom
aamsellem:fix/chart-repo-sync-timeout
Open

fix(hydration): make chart-repo sync-app timeout configurable, default 30s#124
aamsellem wants to merge 1 commit intobeclab:mainfrom
aamsellem:fix/chart-repo-sync-timeout

Conversation

@aamsellem
Copy link
Copy Markdown

Summary

The TaskForApiStep HTTP client had a hardcoded 3 second timeout on its call to chart-repo's /chart-repo/api/v2/dcr/sync-app endpoint. That endpoint runs the full hydration pipeline synchronously (download chart → render → image analysis → DB update). On a first-time install of a new chart the call typically takes ~4 seconds end-to-end — and longer when chart-repo is concurrently processing other sources in the background.

This causes new charts (especially from third-party market sources) to silently fail to install, even though chart-repo itself completes the work successfully a moment after the market client gives up.

Symptoms (observed in production)

market pod:
```
TaskForApiStep - Request failed in 3.000439s for user=…, source=…, app=14fdab3c:
Post "http://chart-repo-service:82/chart-repo/api/v2/dcr/sync-app\":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Step 1 (TaskForApiStep) failed for task: …, moving to render failed list
Removed app 14fdab3c from pending list for user: …, source: …
```

chart-repo pod (same task, ~1 second later):
```
HYDRATION STEP 5/5 COMPLETED: Database and Cache Update
HYDRATION TASK COMPLETED
App 14fdab3c rendering completed successfully
```

The result: hydration succeeded server-side but the orchestrator dropped the app, and the user has to wait for cache cleanup or manually purge Redis to retry.

Changes

  • bump default timeout from 3 * time.Second to 30 * time.Second
  • allow operators to override via the MARKET_CHART_REPO_TIMEOUT_SECONDS environment variable
  • log a warning (and fall back to default) if the env var value is missing/invalid
  • expand the comment block to document why this timeout matters

Why 30s

Empirically, chart-repo sync-app returns within 4–6 seconds for new charts on a typical Olares device. 30 seconds gives a comfortable margin for cold starts, slow disks, or transient contention with the background hydration of large official catalogs (~200 apps). It is still short enough to surface a real failure quickly.

Test plan

  • Install an app from a third-party market source for the first time and verify hydration completes through TaskForApiStep without hitting the deadline.
  • Set MARKET_CHART_REPO_TIMEOUT_SECONDS=60 and confirm the configured value is used.
  • Set MARKET_CHART_REPO_TIMEOUT_SECONDS=abc and confirm a warning is logged and the default (30s) is used.

…t 30s

The TaskForApiStep client had a hardcoded 3 second timeout on the call to
chart-repo's /chart-repo/api/v2/dcr/sync-app endpoint. This endpoint runs
the full hydration pipeline synchronously (download chart → render →
image analysis → database update). On first-time installs of a new chart
the call typically takes ~4 seconds end-to-end — even more when chart-repo
is concurrently processing other sources in the background.

Symptoms observed in the field:
  - market logs: "TaskForApiStep - Request failed in 3.000xxx s ...
    context deadline exceeded"
  - chart-repo logs for the same task: hydration steps 1/5 → 5/5 all
    completed successfully a moment later
  - the task is then moved to the render-failed list and the app silently
    disappears from the user's market until the cache is purged manually

This commit:
  - bumps the default timeout from 3s to 30s
  - allows operators to override via MARKET_CHART_REPO_TIMEOUT_SECONDS
  - logs a warning if the env var value is invalid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant