[XPU] Support Guided Decoding for xpu, Also fix import errors on XPU when torch is installed.#7531
Open
Jiajun-Ji wants to merge 1 commit intoPaddlePaddle:developfrom
Open
[XPU] Support Guided Decoding for xpu, Also fix import errors on XPU when torch is installed.#7531Jiajun-Ji wants to merge 1 commit intoPaddlePaddle:developfrom
Jiajun-Ji wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
…when torch is installed.
|
Thanks for your contribution! |
Contributor
There was a problem hiding this comment.
Pull request overview
该 PR 旨在让 XPU 路径支持 Guided Decoding(Structured Outputs),并修复在 XPU 环境下安装了 torch 时 Triton 兼容层初始化导致的导入/运行问题。
Changes:
- 在
XPUModelRunner中接入 guided decoding backend,并在采样前后增加 guided decoding 的 pre/post 处理流程 - 调整 Triton 兼容驱动初始化条件,避免在非 CUDA 场景(如 XPU)错误创建/使用 driver
- 放开配置层面对 XPU guided decoding 的限制,并更新相关告警逻辑
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| fastdeploy/worker/xpu_model_runner.py | 为 XPU 推理执行流接入 guided decoding backend,初始化 logits processor,并在 sampling 前后维护 guided decoding 状态 |
| fastdeploy/model_executor/ops/triton_ops/triton_utils.py | 仅在 torch 存在且 Paddle 编译为 CUDA 时创建 Triton driver,避免 XPU+torch 场景下的兼容层问题 |
| fastdeploy/config.py | 移除 “XPU 不支持 guided decoding” 的限制逻辑,并调整 postprocess 告警;但 check() 中依赖校验逻辑需要进一步修正 |
Comments suppressed due to low confidence (1)
fastdeploy/config.py:2324
- 这里在
guided_decoding_backend != "off"时无条件import xgrammar,这会导致当用户选择guided_decoding_backend="guidance"且未安装 xgrammar 时,配置校验直接失败(与postprocess()中对 guidance 仅校验 llguidance 的逻辑不一致)。建议按 backend 分支分别校验依赖:xgrammar backend 才 import xgrammar;guidance backend 则校验/导入 llguidance。
if self.structured_outputs_config.guided_decoding_backend != "off":
# TODO: speculative decoding support guided_decoding
assert (
self.speculative_config.method is None
), "speculative decoding currently do not support guided_decoding"
try:
import xgrammar # noqa
except Exception as e:
raise Exception(
f"import XGrammar failed, please install XGrammar use `pip install xgrammar==0.1.19`. \n\t {e}"
)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7531 +/- ##
==========================================
Coverage ? 73.13%
==========================================
Files ? 419
Lines ? 57475
Branches ? 9002
==========================================
Hits ? 42037
Misses ? 12610
Partials ? 2828
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
xpu下支持Structured Outputs功能

需额外在XPU环境下安装
torch 2.6.0+cpu
xgrammar 0.1.19
测试结果如下
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.