fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends#1853
Open
octo-patch wants to merge 1 commit intodottxt-ai:mainfrom
Open
fix: use PreTrainedTokenizerFast wrapper for MLXLM in xgrammar and llguidance backends#1853octo-patch wants to merge 1 commit intodottxt-ai:mainfrom
octo-patch wants to merge 1 commit intodottxt-ai:mainfrom
Conversation
…guidance backends The xgrammar and llguidance backends were passing `model.mlx_tokenizer._tokenizer` (the raw `tokenizers.Tokenizer` backend) to `xgr.TokenizerInfo.from_huggingface` and `llguidance.hf.from_tokenizer` respectively. These APIs expect a `PreTrainedTokenizerFast` wrapper object that exposes attributes like `eos_token_id`, `eos_token`, and `all_special_tokens`. The outlines_core backend already correctly uses `model.mlx_tokenizer` (the wrapper). This commit makes xgrammar and llguidance consistent with that pattern. This is the same underlying issue as dottxt-ai#1847, which reported that `TransformerTokenizer` in `mlxlm.py` also reads from the raw backend instead of the wrapper.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the same class of bug reported in #1847, which noted that code in `mlxlm.py` was passing the raw `tokenizers.Tokenizer` backend instead of the `PreTrainedTokenizerFast` wrapper to downstream APIs.
Problem
Two backends had the same incorrect pattern when handling MLXLM models:
`xgrammar.py` (line 129):
`llguidance.py` (line 226-227):
Both `xgr.TokenizerInfo.from_huggingface` and `llguidance.hf.from_tokenizer` expect a `PreTrainedTokenizerFast` wrapper (which has `eos_token_id`, `eos_token`, `all_special_tokens`, etc.), not the raw `tokenizers.Tokenizer` backend.
With modern `transformers` / `tokenizers` >= 0.25, the raw backend no longer exposes these attributes, causing `AttributeError` when MLXLM is used with the xgrammar or llguidance backends.
Solution
Replace `model.mlx_tokenizer._tokenizer` with `model.mlx_tokenizer` in both backends. This is consistent with:
Testing
Existing MLX integration tests in `tests/backends/test_xgrammar.py` and `tests/backends/test_llguidance.py` cover these code paths when run on Apple Silicon (`HAS_MLX=True`). The backends MLXLM branches are marked `# pragma: no cover` as they require Apple Silicon to execute.