whisper : add --seg-len-hint to discourage progressively shorter segments by lizthegrey · Pull Request #3742 · ggml-org/whisper.cpp

lizthegrey · 2026-04-04T04:32:09Z

Summary

When processing very long audio (multi-hour streams, podcasts, etc.) — particularly content with run-on sentences and few natural pauses — whisper tends to produce progressively shorter segments. This happens because timestamp tokens accumulate in the decoder's rolling prompt context (prompt_past1), conditioning the model to insert segment breaks more frequently. Over time this creates a feedback loop where short segments beget shorter segments, eventually degrading to one word per line.

This PR adds a seg_len_hint parameter (in milliseconds) that thins timestamp tokens in the rolling prompt context, keeping at most one per seg_len_hint interval. Text tokens are always preserved for continuity. The model can still break segments on natural boundaries (speaker turns, pauses) — the hint only affects context conditioning, not actual segment creation. Short segments are still produced where genuinely appropriate.

New field seg_len_hint in whisper_full_params (default 0 = off)
CLI flag: --seg-len-hint N / -slh N
Does not affect --max-len 1 word-level timestamp mode

Alternatives considered

Post-processing merge of short segments: After decoding completes, merge adjacent short segments until they reach a minimum character count, flushing on punctuation/time gaps/speaker turns. Discarded because it does not address the root cause — the decoder still wastes inference cycles producing one-word segments, and any merging heuristic either prevents genuinely short segments from appearing (e.g. rhetorical pauses) or requires complex rules about when merging is appropriate.

Enforcing a minimum segment length: Suppress timestamp tokens during decoding if the current segment is too short. Discarded for similar reasons — it fights the model's output rather than fixing the conditioning that causes the problem, and it prevents the model from producing short segments where the audio genuinely warrants them.

The approach taken here (thinning timestamps in the prompt context) addresses the feedback loop at its source: the model no longer sees a dense history of frequent segment breaks, so it stops being primed to produce more of them.

Test plan

--seg-len-hint 2000 on long audio — segments stay at natural clause/sentence length
--seg-len-hint 0 (default) — no behavior change
--max-len 1 — word-level timestamps still work correctly
Short audio (JFK sample) — no change in output
MLK "I Have a Dream" (16 min, archive.org) — no regression; rhetorical short phrases like "Go back to Mississippi" and "We cannot turn back" correctly remain as short segments where MLK pauses for effect

…ents When processing long audio, whisper tends to produce progressively shorter segments because timestamp tokens in the decoder prompt context condition the model to insert more frequent segment breaks. Add a seg_len_hint parameter (in ms) that thins timestamp tokens in the rolling prompt context, keeping at most one per seg_len_hint interval. This breaks the feedback loop while preserving text tokens for continuity. The model can still break on natural boundaries (speaker turns, pauses) — the hint only affects context conditioning, not the actual segment creation. Usage: --seg-len-hint 2000 (for ~2 second target segments) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The initial --seg-len-hint commit wired the flag into whisper-cli but not whisper-server. Mirrors the existing best_of / beam_size pattern at server.cpp:221-222 (CLI) and :505-511 (POST form field) and assigns the value to wparams.seg_len_hint during inference setup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lizthegrey and others added 2 commits April 3, 2026 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : add --seg-len-hint to discourage progressively shorter segments#3742

whisper : add --seg-len-hint to discourage progressively shorter segments#3742
lizthegrey wants to merge 2 commits intoggml-org:masterfrom
lizthegrey:lizf.seg-len-hint

lizthegrey commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lizthegrey commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Alternatives considered

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lizthegrey commented Apr 4, 2026 •

edited

Loading