fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends by octo-patch · Pull Request #1853 · dottxt-ai/outlines

octo-patch · 2026-04-25T02:14:47Z

Fixes the same class of bug reported in #1847, which noted that code in `mlxlm.py` was passing the raw `tokenizers.Tokenizer` backend instead of the `PreTrainedTokenizerFast` wrapper to downstream APIs.

Problem

Two backends had the same incorrect pattern when handling MLXLM models:

`xgrammar.py` (line 129):

# Before (wrong):
tokenizer = model.mlx_tokenizer._tokenizer  # raw tokenizers.Tokenizer
xgr.TokenizerInfo.from_huggingface(tokenizer, ...)

`llguidance.py` (line 226-227):

# Before (wrong):
llguidance.hf.from_tokenizer(model.mlx_tokenizer._tokenizer)  # raw backend

Both `xgr.TokenizerInfo.from_huggingface` and `llguidance.hf.from_tokenizer` expect a `PreTrainedTokenizerFast` wrapper (which has `eos_token_id`, `eos_token`, `all_special_tokens`, etc.), not the raw `tokenizers.Tokenizer` backend.

With modern `transformers` / `tokenizers` >= 0.25, the raw backend no longer exposes these attributes, causing `AttributeError` when MLXLM is used with the xgrammar or llguidance backends.

Solution

Replace `model.mlx_tokenizer._tokenizer` with `model.mlx_tokenizer` in both backends. This is consistent with:

The outlines_core backend, which already correctly uses `model.mlx_tokenizer` (line 200 of `outlines_core.py`)
The Transformers model path in both backends, which uses `model.hf_tokenizer` (the `PreTrainedTokenizerFast` wrapper)

Testing

Existing MLX integration tests in `tests/backends/test_xgrammar.py` and `tests/backends/test_llguidance.py` cover these code paths when run on Apple Silicon (`HAS_MLX=True`). The backends MLXLM branches are marked `# pragma: no cover` as they require Apple Silicon to execute.

…guidance backends The xgrammar and llguidance backends were passing `model.mlx_tokenizer._tokenizer` (the raw `tokenizers.Tokenizer` backend) to `xgr.TokenizerInfo.from_huggingface` and `llguidance.hf.from_tokenizer` respectively. These APIs expect a `PreTrainedTokenizerFast` wrapper object that exposes attributes like `eos_token_id`, `eos_token`, and `all_special_tokens`. The outlines_core backend already correctly uses `model.mlx_tokenizer` (the wrapper). This commit makes xgrammar and llguidance consistent with that pattern. This is the same underlying issue as dottxt-ai#1847, which reported that `TransformerTokenizer` in `mlxlm.py` also reads from the raw backend instead of the wrapper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends#1853

fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends#1853
octo-patch wants to merge 1 commit intodottxt-ai:mainfrom
octo-patch:fix/xgrammar-mlxlm-tokenizer

octo-patch commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 25, 2026

Problem

Solution

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant