[Feature] Support NaiveProposer for most cases by huicongyao · Pull Request #7669 · PaddlePaddle/FastDeploy

huicongyao · 2026-04-29T10:52:34Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

--speculative-config '{"method": "naive"}'

Function Tests

测试场景	模式	测试状态
裸FD	集中式	✅
裸FD	P/D分离	✅
裸FD	集中式 + logprob	✅
裸FD	PD分离 + logprob	✅
裸FD	overlap	✅
RL场景	naive spec各功能	待测试

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-29T10:52:41Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-04-29T11:49:18Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-11 17:29:44

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: f892d67
Merge base: ecce6b5 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

CI 仍在进行中，1 个 required 任务失败（Approval），另有 3 个运行中、4 个等待中。需要获取指定成员审批方可通过 Approval 检查。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
36(0)	36	24	2	5	5	0

2 任务状态汇总

2.1 Required任务：2/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	9s	PR问题：修改受限目录，缺少4项指定成员审批	请联系指定 Reviewer 进行审批	Job	-
⏳	`run_ce_cases`	-	运行中	-	Job	-
⏳	`base_tests`	-	运行中	-	Job	-
⏳	`stable_tests`	-	运行中	-	Job	-
⏸️	`run_tests_with_coverage`	-	等待中	-	-	-
⏸️	`run_4_cards_tests`	-	等待中	-	-	-
⏸️	`run_xpu_4cards_cases`	-	等待中	-	-	-
⏸️	`run_xpu_8cards_cases`	-	等待中	-	-	-
✅	其余 2 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 22/26 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Check PR Template`	14s	Job	-
⏳	`run_iluvatar_cases`	-	Job	-
⏳	`Trigger Jenkins for PR`	-	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 22 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 审批缺失（置信度: 高）

Approval

状态: ❌ 失败
错误类型: 基础设施（审批流程）
置信度: 高
根因摘要: PR修改了受限目录，check_approval.sh 检测到缺少4项指定成员审批
分析器: 通用分析(fallback)

根因详情:

scripts/check_approval.sh 脚本检测到本 PR 修改了需要特定成员审批的受保护目录/功能区域，共发现 4 项未满足的审批要求，脚本以 exit code 6 退出（exit code = 审批错误数量）。

关键日志:

0. You must have one FastDeploy RD (qingqing01, Jiang-Jia-Jun, heavengate) approval for adding custom op.
1. You must have one PaddlePaddle RD (jeff41404, yongqiangma) approval for adding custom op.
2. You must have one FastDeploy RD gongshaotian approval for modifing [fastdeploy/model_executor/graph_optimization].
3. You must have one FastDeploy RD (freeliuzc, Deleter-D) approval for modifing [fastdeploy/spec_decode,custom_ops/gpu_ops/speculate_decoding].
There are 4 approved errors.
##[error]Process completed with exit code 6.

修复建议:

请 @freeliuzc 或 @Deleter-D 对 fastdeploy/spec_decode、custom_ops/gpu_ops/speculate_decoding 修改进行 Review 并 Approve。
如有 custom op 新增，请 @qingqing01 / @Jiang-Jia-Jun / @heavengate（FastDeploy RD）和 @jeff41404 / @yongqiangma（PaddlePaddle RD）审批。
如 fastdeploy/model_executor/graph_optimization 有修改，还需 @gongshaotian 审批。

修复建议摘要: 请 @freeliuzc/@Deleter-D 等指定 Reviewer 完成审批

关联变更: PR 修改了 fastdeploy/spec_decode、custom_ops/gpu_ops/speculate_decoding、fastdeploy/model_executor/graph_optimization 等受保护目录

链接: 查看日志

codecov-commenter · 2026-04-29T12:20:18Z

Codecov Report

❌ Patch coverage is 27.27273% with 24 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@ecce6b5). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/spec_decode/naive.py	0.00%	9 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py	46.15%	6 Missing and 1 partial ⚠️
...astdeploy/model_executor/layers/sample/logprobs.py	0.00%	3 Missing ⚠️
...cutor/layers/sample/ops/speculate_logprob_utils.py	25.00%	3 Missing ⚠️
fastdeploy/spec_decode/types.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7669   +/-   ##
==========================================
  Coverage           ?   71.61%           
==========================================
  Files              ?      397           
  Lines              ?    55733           
  Branches           ?     8715           
==========================================
  Hits               ?    39914           
  Misses             ?    13073           
  Partials           ?     2746

Flag	Coverage Δ
GPU	`71.61% <27.27%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

该 PR 旨在把 SpecMethod.NAIVE 纳入现有 speculative decoding 框架的“统一 proposer 流程”，让大多数 speculative 相关路径不再依赖 “NAIVE 时 proposer=None” 的特例分支；同时补齐 NAIVE + logprob 场景下需要的 cu_batch_token_offset 计算能力。

Changes:

为 SpecMethod.NAIVE 引入 NaiveProposer（no-op），并在 GPU runner 的 dummy/postprocess 流程中统一调用 proposer.run()。
PD 分离（prefill/decode）链路中，为 NAIVE 模式补齐首 token 的 draft_token_ids 传递与 decode 侧首步 draft_tokens/seq_lens_this_time 初始化。
新增 speculate_compute_cu_batch_offset GPU op，并在 NAIVE logprob 构建流程中计算 cu_batch_token_offset。

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
fastdeploy/worker/gpu_model_runner.py	NAIVE 分支补齐 draft_tokens 初始化、dummy/postprocess 中统一 proposer 调用，以及 infer_seed 更新逻辑调整
fastdeploy/spec_decode/types.py	`SpecMethod.NAIVE` 现在创建 `NaiveProposer`，不再返回 None
fastdeploy/spec_decode/naive.py	新增 `NaiveProposer`（no-op proposer）
fastdeploy/output/token_processor.py	splitwise prefill 角色下，NAIVE 仅生成 1 token 时也填充 `draft_token_ids` 供 decode 侧初始化
fastdeploy/model_executor/layers/sample/ops/speculate_logprob_utils.py	新增 `speculate_compute_cu_batch_offset` Python 封装
fastdeploy/model_executor/layers/sample/ops/init.py	导出 `speculate_compute_cu_batch_offset`
fastdeploy/model_executor/layers/sample/logprobs.py	NAIVE logprob 构建时使用 accept_tokens[:real_bsz] 并计算 `cu_batch_token_offset`
fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py	更健壮地判断 `real_bsz_to_captured_size` 是否为空，避免空字典场景误判
fastdeploy/engine/sched/resource_manager_v1.py	PD decode 侧接收 prefill 输出时，NAIVE 也拷贝 `draft_token_ids`
fastdeploy/engine/common_engine.py	同上：NAIVE + PD decode 侧拷贝 `draft_token_ids`
custom_ops/gpu_ops/speculate_decoding/speculate_logprob_utils.cu	新增 `SpeculateComputeCuBatchOffset` 内核与 static op 注册
custom_ops/gpu_ops/cpp_extensions.cc	新增 pybind 导出 `speculate_compute_cu_batch_offset`

    def create_proposer(self, fd_config, **kwargs) -> Optional["Proposer"]:
        """Factory method: create the appropriate Proposer for this method.

        Args:
            fd_config: FDConfig instance.
            **kwargs: Method-specific args forwarded to the Proposer constructor.
                MTP requires: main_model, local_rank, device_id, share_inputs.

        Returns:
            Proposer instance, or None for NAIVE.
        """


    def create_proposer(self, fd_config, **kwargs) -> Optional["Proposer"]:
        """Factory method: create the appropriate Proposer for this method.

        Args:
            fd_config: FDConfig instance.
            **kwargs: Method-specific args forwarded to the Proposer constructor.
                MTP requires: main_model, local_rank, device_id, share_inputs.

        Returns:
            Proposer instance, or None for NAIVE.
        """
        if self == SpecMethod.NAIVE:
-            return None
+            from fastdeploy.spec_decode.naive import NaiveProposer
+
+            return NaiveProposer(fd_config)


+    Proposer for NaiveProposer.
+
+    Not propose draft tokens, simply utilizing the framework
+    to place the last autoregressively generated token in
+    the first position of draft_tokens.


+    Compute cumulative batch offset via inclusive prefix sum of accept_num.
+    """
+    if current_platform.is_cuda():
+        from fastdeploy.model_executor.ops.gpu import speculate_compute_cu_batch_offset
+
+        speculate_compute_cu_batch_offset(cu_batch_token_offset, accept_num, real_bsz)


+        speculate_compute_cu_batch_offset(
+            share_inputs["cu_batch_token_offset"],
+            share_inputs["accept_num"],
+            max_occupied_slots,


+def speculate_compute_cu_batch_offset(
+    cu_batch_token_offset: paddle.Tensor,
+    accept_num: paddle.Tensor,
+    real_bsz: int,
+):
+    """
+    Compute cumulative batch offset via inclusive prefix sum of accept_num.
+    """
+    if current_platform.is_cuda():
+        from fastdeploy.model_executor.ops.gpu import speculate_compute_cu_batch_offset
+
+        speculate_compute_cu_batch_offset(cu_batch_token_offset, accept_num, real_bsz)
+    else:
+        raise NotImplementedError


freeliuzc · 2026-05-09T03:22:36Z

        # Get real shape (total num tokens)
-        if self.speculative_decoding and all(self.real_bsz_to_captured_size.values()):
+        if (
+            self.speculative_decoding


这里加是为了兜住什么条件

real_bsz_to_captured_size在naive 模式下为空，但是all(self.real_bsz_to_captured_size.values())为true，会导致这块代码运行出错

freeliuzc · 2026-05-09T03:23:38Z

        # NAIVE mode: one token per request, logits are already correct
        output_logits = logits
-        token_ids = share_inputs["accept_tokens"][:max_occupied_slots, 0]
+        token_ids = share_inputs["accept_tokens"][:real_bsz, 0]


这里可以复用投机解码分支吗？

应该可以

freeliuzc · 2026-05-09T03:25:00Z

+                    result.outputs.draft_token_ids = copy.deepcopy(token_ids)
+                elif (
+                    self.cfg.speculative_config.method == SpecMethod.NAIVE
+                    and self.cfg.scheduler_config.splitwise_role == "prefill"


这里的判断条件有点乱，建议直接根据方法判断，不能既有 len 又有方法

已优化这部分的判断逻辑

freeliuzc · 2026-05-09T03:25:51Z

+class NaiveProposer(Proposer):
+    """
+    Proposer for NaiveProposer.
+


这里是哪些变量必须使用 proposer 呢，建议删除此结构，不增加冗余代码

在initialize_forwardmeta处self.proposer.fd_config.model_config.moe_phase.phase = "decode"

freeliuzc · 2026-05-09T03:27:05Z

-                            raise ValueError(
-                                "Expected at least 2 draft tokens for speculative suffix decode, "
-                                f"but got {len(draft_tokens_to_write)} for request {request.request_id}."
+                        if self.spec_method in (SpecMethod.MTP, SpecMethod.SUFFIX):


suffix 暂时不需要传此token，另外即使传，这里也是动态的

这Naive支持之前，对suffix也有传，去掉的话单测会不通过，此处为了保持一致

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-11 17:31:44

📋 Review 摘要

PR 概述：为 NaiveProposer 实现完整框架接入，支持集中式/PD分离/logprob/overlap 等主要场景

变更范围：fastdeploy/spec_decode/、custom_ops/gpu_ops/speculate_decoding/、fastdeploy/worker/gpu_model_runner.py、fastdeploy/model_executor/、fastdeploy/engine/、fastdeploy/output/

影响面 Tag：[Speculative Decoding] [OP] [Graph Optimization] [Engine]

📝 PR 规范检查

## Motivation 和 ## Modifications 内容为空（仅模板占位符），且缺少必填章节 ## Accuracy Tests（PR body 中以非标准 ## Function Tests 替代）。建议替换 PR 描述为以下完整内容：

PR 描述建议（可直接复制）：

## Motivation
NaiveProposer 在 PD 分离和集中式场景下均未能有效接入推理框架：NAIVE 模式下 `create_proposer` 返回 `None`，导致多处需特殊判断 `proposer is None`；logprob 路径缺少 `cu_batch_token_offset` 的计算支持；PD 分离场景中 `draft_token_ids` 未正确传播。本 PR 将 NaiveProposer 接入框架全链路，支持集中式、PD 分离、logprob、overlap 等主要场景。

## Modifications
- `fastdeploy/spec_decode/naive.py`：新增 `NaiveProposer` 类，`_run_impl` 为 no-op
- `fastdeploy/spec_decode/types.py`：NAIVE 的 `create_proposer` 改为返回 `NaiveProposer(fd_config)` 而非 `None`
- `fastdeploy/worker/gpu_model_runner.py`：`insert_tasks_v1` 新增 NAIVE 路径（写 1 个 token，seq_len=1）；NAIVE 模式下同步更新 `infer_seed`
- `fastdeploy/engine/common_engine.py`、`resource_manager_v1.py`：PD 分离 decode 节点条件扩展至 NAIVE
- `fastdeploy/output/token_processor.py`：prefill 阶段 NAIVE 模式设置 `draft_token_ids`，供 decode 节点初始化首个 decode step
- `custom_ops/gpu_ops/speculate_decoding/speculate_logprob_utils.cu`：新增 `SpeculateComputeCuBatchOffset` kernel 及注册
- `fastdeploy/model_executor/layers/sample/logprobs.py`：NAIVE logprob 路径调用新 kernel 计算 `cu_batch_token_offset`
- `fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py`：修复 `real_bsz_to_captured_size` 空 dict 时的潜在误进入问题

## Usage or Command
`--speculative-config '{"method": "naive"}'`

## Accuracy Tests
N/A（NAIVE 模式为框架接入变更，不影响模型 logits 精度；功能测试如下）

| 测试场景 | 模式 | 测试状态 |
|---------|------|---------|
| 裸FD | 集中式 | ✅ |
| 裸FD | P/D分离 | ✅ |
| 裸FD | 集中式 + logprob | ✅ |
| 裸FD | PD分离 + logprob | ✅ |
| 裸FD | overlap | ✅ |
| RL场景 | naive spec各功能 | 待测试 |

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🟡 建议	`fastdeploy/model_executor/layers/sample/logprobs.py:191`	`speculate_compute_cu_batch_offset` 传入 `max_occupied_slots` 而非 `real_bsz`，与同块 `token_ids[:real_bsz]` 语义不一致
🟡 建议	`custom_ops/gpu_ops/speculate_decoding/speculate_logprob_utils.cu`	新增 kernel 未在 `tests/operators/` 补充单元测试（A3 要求）

总体评价

实现思路清晰，kernel 注册、Python 绑定、调用侧均同步更新，cudagraph 空 dict bug fix 值得肯定。建议确认 max_occupied_slots 与 real_bsz 在 NAIVE 模式下是否始终相等，并补充 kernel 单元测试。

huicongyao temporarily deployed to Metax_ci April 29, 2026 10:52 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

huicongyao changed the title ~~[Feature] Support NaiveProposer for most cases and set to default~~ [Feature] Support NaiveProposer for most cases Apr 29, 2026

huicongyao force-pushed the develop branch from 36a140b to 267cca7 Compare April 30, 2026 08:42

huicongyao had a problem deploying to Metax_ci April 30, 2026 08:42 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

freeliuzc requested a review from Copilot May 9, 2026 03:20

Copilot started reviewing on behalf of freeliuzc May 9, 2026 03:20 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

freeliuzc reviewed May 9, 2026

View reviewed changes

huicongyao added 5 commits May 11, 2026 17:00

support naive proposer as default

a3f6266

fix

e3220e2

fix logprob overlap

d945c49

fix unitest err

8c1fc34

optimize

f892d67

huicongyao force-pushed the develop branch from 267cca7 to f892d67 Compare May 11, 2026 09:16

huicongyao had a problem deploying to Metax_ci May 11, 2026 09:16 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 11, 2026

View reviewed changes

Conversation

huicongyao commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Function Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 29, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务：2/10 通过

2.2 可选任务 — 22/26 通过

3 失败详情（仅 required）

Approval

Uh oh!

codecov-commenter commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

huicongyao commented Apr 29, 2026 •

edited

Loading

PaddlePaddle-bot commented Apr 29, 2026 •

edited

Loading

codecov-commenter commented Apr 29, 2026 •

edited

Loading