Skip to content

Commit ce19267

Browse files
davidmcc73claude
andcommitted
Pass temperature and alpha to MTP speculative decoding
Default temp=0.7 (matching exo's default) so probabilistic acceptance runs correctly. Configurable via EXO_SPECULATIVE_TEMP and EXO_SPECULATIVE_ALPHA env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8a65a51 commit ce19267

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

src/exo/worker/engines/mlx/generator/batch_generate.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,14 @@ def __post_init__(self) -> None:
9393

9494
if mtp_weights and os.path.exists(mtp_weights):
9595
mtp = MTPPredictor(self.model, mtp_weights, quantize=False)
96+
temp = float(os.environ.get("EXO_SPECULATIVE_TEMP", "0.7"))
97+
alpha = float(os.environ.get("EXO_SPECULATIVE_ALPHA", "1.0"))
9698
self._exo_gen = MTPBatchGenerator(
9799
model=self.model,
98100
mtp_predictor=mtp,
99101
gamma=gamma,
102+
temp=temp,
103+
alpha=alpha,
100104
stop_tokens=stop_tokens,
101105
prefill_step_size=4096,
102106
)

0 commit comments

Comments
 (0)