Skip to content

[Cherry-Pick][BugFix] support deepgemm without bias input(#7559)#7565

Merged
zoooo0820 merged 1 commit intoPaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7559/release/2.6
Apr 23, 2026
Merged

[Cherry-Pick][BugFix] support deepgemm without bias input(#7559)#7565
zoooo0820 merged 1 commit intoPaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7559/release/2.6

Conversation

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

@EmmonsCurse EmmonsCurse commented Apr 22, 2026

Cherry-pick of #7559 (authored by @zoooo0820) to release/2.6.

devPR:#7559


Motivation

适配deepgemm缺少bias参数时的处理

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
  • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
  • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 22, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-22 19:51:03

📋 Review 摘要

PR 概述:适配 DeepGEMM FP8 GEMM 算子在不支持 bias 关键字参数时的兼容处理,通过模块加载时运行时探测自动选择调用路径。
变更范围model_executor/layers/quantization/block_wise_fp8.py
影响面 TagOP Quantization


📝 PR 规范检查

Modifications 节未填写,建议补充说明本次变更的具体修改内容,供可直接复制的示例如下:

Modifications(可直接复制):

  • 在模块加载时通过 inspect.signature 探测 fp8_gemm_nt 是否支持 bias 关键字参数,结果缓存至全局变量 _fp8_gemm_nt_has_bias_kwarg
  • deep_gemm_fp8_gemm_nt 函数中增加分支:支持 bias 时原样调用,不支持时先调用无 bias 版本再通过 paddle.add 追加 bias。

问题

级别 文件 概述
🟡 建议 block_wise_fp8.py:62 异常捕获注释与实现不符,误导读者
🟡 建议 block_wise_fp8.py:157 paddle.add 创建额外 Tensor,可考虑原地操作

总体评价

整体思路清晰,运行时 capability detection 是处理 pybind11 函数签名差异的合理方案,兼容性设计正确。建议修正误导性注释并评估 paddle.add 的显存影响后合入。

Comment thread fastdeploy/model_executor/layers/quantization/block_wise_fp8.py
(layer_weight, layer_weight_scale_inv),
linear_out,
)
if bias is not None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 额外 Tensor 分配

linear_out = paddle.add(linear_out, bias) 会产生一个新 Tensor,而非原地操作,在大 batch / 长序列场景下会带来额外显存峰值。建议评估是否可使用 paddle.add_(linear_out, bias) 原地加法,或与上游确认 linear_out 的生命周期是否允许原地操作。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 38.46154% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@3d6d3a2). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...del_executor/layers/quantization/block_wise_fp8.py 38.46% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7565   +/-   ##
==============================================
  Coverage               ?   73.75%           
==============================================
  Files                  ?      376           
  Lines                  ?    53097           
  Branches               ?     8303           
==============================================
  Hits                   ?    39161           
  Misses                 ?    11188           
  Partials               ?     2748           
Flag Coverage Δ
GPU 73.75% <38.46%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zoooo0820 zoooo0820 merged commit 258b22a into PaddlePaddle:release/2.6 Apr 23, 2026
50 of 55 checks passed
@EmmonsCurse EmmonsCurse deleted the cherry-pick/7559/release/2.6 branch April 24, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants