Skip to content

fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends#1853

Open
octo-patch wants to merge 1 commit intodottxt-ai:mainfrom
octo-patch:fix/xgrammar-mlxlm-tokenizer
Open

fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends#1853
octo-patch wants to merge 1 commit intodottxt-ai:mainfrom
octo-patch:fix/xgrammar-mlxlm-tokenizer

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes the same class of bug reported in #1847, which noted that code in `mlxlm.py` was passing the raw `tokenizers.Tokenizer` backend instead of the `PreTrainedTokenizerFast` wrapper to downstream APIs.

Problem

Two backends had the same incorrect pattern when handling MLXLM models:

`xgrammar.py` (line 129):

# Before (wrong):
tokenizer = model.mlx_tokenizer._tokenizer  # raw tokenizers.Tokenizer
xgr.TokenizerInfo.from_huggingface(tokenizer, ...)

`llguidance.py` (line 226-227):

# Before (wrong):
llguidance.hf.from_tokenizer(model.mlx_tokenizer._tokenizer)  # raw backend

Both `xgr.TokenizerInfo.from_huggingface` and `llguidance.hf.from_tokenizer` expect a `PreTrainedTokenizerFast` wrapper (which has `eos_token_id`, `eos_token`, `all_special_tokens`, etc.), not the raw `tokenizers.Tokenizer` backend.

With modern `transformers` / `tokenizers` >= 0.25, the raw backend no longer exposes these attributes, causing `AttributeError` when MLXLM is used with the xgrammar or llguidance backends.

Solution

Replace `model.mlx_tokenizer._tokenizer` with `model.mlx_tokenizer` in both backends. This is consistent with:

  • The outlines_core backend, which already correctly uses `model.mlx_tokenizer` (line 200 of `outlines_core.py`)
  • The Transformers model path in both backends, which uses `model.hf_tokenizer` (the `PreTrainedTokenizerFast` wrapper)

Testing

Existing MLX integration tests in `tests/backends/test_xgrammar.py` and `tests/backends/test_llguidance.py` cover these code paths when run on Apple Silicon (`HAS_MLX=True`). The backends MLXLM branches are marked `# pragma: no cover` as they require Apple Silicon to execute.

…guidance backends

The xgrammar and llguidance backends were passing `model.mlx_tokenizer._tokenizer`
(the raw `tokenizers.Tokenizer` backend) to `xgr.TokenizerInfo.from_huggingface`
and `llguidance.hf.from_tokenizer` respectively. These APIs expect a
`PreTrainedTokenizerFast` wrapper object that exposes attributes like
`eos_token_id`, `eos_token`, and `all_special_tokens`.

The outlines_core backend already correctly uses `model.mlx_tokenizer` (the
wrapper). This commit makes xgrammar and llguidance consistent with that pattern.

This is the same underlying issue as dottxt-ai#1847, which reported that
`TransformerTokenizer` in `mlxlm.py` also reads from the raw backend instead of
the wrapper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant