fix(lora): wait for in-flight Civitai downloads before pipeline __init__ (#937)#940
Open
livepeer-tessa wants to merge 3 commits intomainfrom
Open
fix(lora): wait for in-flight Civitai downloads before pipeline __init__ (#937)#940livepeer-tessa wants to merge 3 commits intomainfrom
livepeer-tessa wants to merge 3 commits intomainfrom
Conversation
added 3 commits
April 14, 2026 06:25
…peline ID (#936) Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
…FoundError Fixes #937. When a session is re-initialised with a Civitai-hosted LoRA, the local Civitai download is re-triggered asynchronously while pipeline_manager concurrently calls LongLivePipeline.__init__. load_lora_weights was called synchronously before the download completed, causing a spurious FileNotFoundError that failed the pipeline load and left the session needing a manual retry. Fix: add _wait_for_lora_file() in lora/utils.py that polls for the file's existence before raising. The timeout defaults to 120 s and is configurable via SCOPE_LORA_DOWNLOAD_WAIT_TIMEOUT. No change to callers; existing FileNotFoundError semantics are preserved when the file genuinely does not appear within the timeout. Also adds tests/test_lora_wait_for_file.py covering: - file already present → returns immediately - file appears during wait → returns True - file never appears → returns False / raises after timeout - timeout=0 → disables waiting - SCOPE_LORA_DOWNLOAD_WAIT_TIMEOUT env var override Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
On session reinitialisation the frontend concurrently (a) re-downloads a Civitai-hosted LoRA and (b) calls POST /api/v1/pipeline/load, which triggers LongLivePipeline.__init__ -> _init_loras(). If the pipeline __init__ wins the race the file doesn't exist yet and PeftLoRAStrategy.load_adapters_from_list raises FileNotFoundError, which surfaces as a spurious 'Some pipelines failed to load' error. The session recovers on the next retry (~60-90 s later) once the download completes. Fix: add _wait_for_lora_files() to LoRAEnabledPipeline._init_loras(). Before delegating to LoRAManager it polls for each missing LoRA file up to 120 s (poll every 2 s). Files already present are skipped immediately, so there is zero overhead on the normal (warm cache) path. After the timeout a warning is logged and execution continues — the strategy loader still raises its own error for genuinely missing files. Fixes #937 Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Contributor
🚀 fal.ai Preview Deployment
Livepeer Runner
Testing Livepeer Mode |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On session reinitialisation the frontend concurrently:
POST /api/v1/lorasto re-download the Civitai LoRA (async)POST /api/v1/pipeline/load→LongLivePipeline.__init__→_init_loras()(sync)If the pipeline load wins the race the file doesn't exist yet and
PeftLoRAStrategy.load_adapters_from_listraisesFileNotFoundError, surfacing as a spurious "Some pipelines failed to load" error. The session self-heals on retry once the download completes (~60–90 s), but wastes time and pollutes the error logs.This is distinct from:
/tmp/cleared between jobs)Observed in job
a8a03ca5-6fce-4cdd-8bca-580e8fbafeeb(scope-app--prod) on 2026-04-13 ~23:38 UTC.Fix
Add
_wait_for_lora_files()toLoRAEnabledPipeline._init_loras()inmixin.py.Before delegating to
LoRAManager, the function polls for each missing LoRA file with a 2 s interval, up to 120 s.Zero overhead on the normal (warm cache) path — files that already exist are skipped immediately via
Path.exists()check before entering the loop.After the timeout a warning is logged and execution continues — the strategy loader still raises for genuinely missing files, so error behaviour for permanent failures is unchanged.
Testing
pendingis empty, function returns immediately.PeftLoRAStrategystill raisesFileNotFoundError→ propagates as before.Closes #937