Fix LocalBackend fork_checkpoint to overwrite initial LoRA for vLLM#652
Open
Fix LocalBackend fork_checkpoint to overwrite initial LoRA for vLLM#652
Conversation
When forking a checkpoint, the source checkpoint was copied to
checkpoints/{source_step} in the destination model directory. However,
model.register(backend) already created an empty LoRA at checkpoints/0000.
When vLLM starts, it loads @0 — the empty 0000 checkpoint — not the
forked one. Fix by also copying the forked weights to checkpoints/0000
so vLLM loads the correct weights on startup.
Fixes #651
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kovbo
approved these changes
Apr 13, 2026
The real issue is that UnslothService._state (a cached_property) may be initialized before the fork copies the checkpoint, caching the base model instead of the forked weights. Invalidating the cache after fork ensures the trainer picks up the forked checkpoint on next access. The step-0 overwrite was unnecessary — vLLM's start_openai_server already calls get_last_checkpoint_dir() which finds the forked checkpoint at its original step number.
After _experimental_fork_checkpoint, store the checkpoint path on the service. On the first _train_dedicated/_train_shared call, load the adapter weights via load_lora_adapter before training begins. This is needed because create_unsloth_train_context may initialize the LoRA architecture from adapter_config.json without loading the actual trained weights from adapter_model.safetensors, especially when the checkpoint was trained at a different precision than the current load config.
5489fde to
0d53531
Compare
0d53531 to
62e4fbc
Compare
On H200 GPUs, base model activations run in bf16 while LoRA adapter weights are fp16. Unsloth's fused matmul_lora and fast_linear_forward call addmm_/addmv_ which crash on mixed dtypes. This patch casts tensors to a common dtype before those ops. Applied automatically when UnslothService._state is first accessed.
9346894 to
e0decea
Compare
e0decea to
ba4c406
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
LocalBackend._experimental_fork_checkpointhas two issues that prevent forked LoRA checkpoints from being used correctly:1. vLLM loads the wrong checkpoint
model.register(backend)creates an empty LoRA atcheckpoints/0000. The fork then copies the real weights tocheckpoints/{source_step}(e.g.0686). Butstart_openai_servercallsget_last_checkpoint_dir()which finds the forked checkpoint — however the vLLM subprocess is configured with@0alias pointing to0000.2. Trainer uses the wrong weights
UnslothService._stateis acached_propertythat may be initialized before the fork runs. Even when it re-initializes after fork,create_unsloth_train_contextcallsFastLanguageModel.from_pretrained(model_name=checkpoint_dir)which sets up the LoRA architecture but may not load the trained weights correctly across precision boundaries (e.g. checkpoint trained in 4-bit, loaded in 16-bit).3. Mixed bf16/fp16 dtype crash on H200
On H200 GPUs, base model activations run in bf16 while LoRA adapter weights are fp16. Unsloth's fused
matmul_loraandfast_linear_forwardcalladdmm_/addmv_which crash on mixed dtypes:RuntimeError: self and mat2 must have the same dtype, but got Half and BFloat16.Fix
Checkpoint loading (backend.py)
checkpoints/0000with the forked weights so vLLM loads the correct adapter on startup._statecache so the trainer re-initializes with the forked checkpoint path._forked_checkpoint_diron the service and callload_adapter_from_checkpointon the first training call to explicitly load the adapter weights viaset_peft_model_state_dict.Dtype patch (dtype_patch.py)
Patches
matmul_loraandfast_linear_forwardto cast tensors to a common dtype (preferring bf16) before fused ops. Applied automatically when_stateis first accessed.Verification
Without fix:
val/qa_failed = 40-80%at step 0 (base model behavior) + dtype crashesWith fix:
val/qa_failed = 7%matching W&B Inference baseline (8%), no crashesCloses #651