Small attention optimization: pre-scale q tile with qk_scale by AmesingFlank · Pull Request #2157 · pytorch/helion

AmesingFlank · 2026-04-30T22:36:48Z

Stacked PRs:

Small optimization, discovered by @cota. Here, q is of size (block_q, head_dim) which is smaller than qk, which is (block_q, block_k). Typically, head_dim is 64/128 which is likely smaller than block_k. So this results in slightly less ALU pressure.

On TPUs, with

export LIBTPU_INIT_ARGS="--xla_tpu_dvfs_p_state=7 --xla_tpu_scoped_vmem_limit_kib=65536"

B, H, S, D = 8, 32, 8192, 128,

block_sizes = [2, 1024, 1024]

This optimization provides a 1~2 TFLOPs increasing, from a base line of 290 TFLOPs.

It's not much, but it's something.

stack-info: PR: #2156, branch: AmesingFlank/stack/31

stack-info: PR: #2157, branch: AmesingFlank/stack/32

AmesingFlank added 2 commits April 30, 2026 22:36

Remove redundant m_i update in example attention kernel

d4de5d5

stack-info: PR: #2156, branch: AmesingFlank/stack/31

Small attention optimization: pre-scale q tile with qk_scale

5625c40

stack-info: PR: #2157, branch: AmesingFlank/stack/32

AmesingFlank force-pushed the AmesingFlank/stack/31 branch from b3a887c to d4de5d5 Compare April 30, 2026 22:36

AmesingFlank force-pushed the AmesingFlank/stack/32 branch from 0a7a1a9 to 5625c40 Compare April 30, 2026 22:36

AmesingFlank mentioned this pull request Apr 30, 2026

Remove redundant m_i update in example attention kernel #2156

Merged

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 30, 2026

AmesingFlank requested review from jansel, norx1991 and oulgen April 30, 2026 22:41

norx1991 approved these changes Apr 30, 2026

View reviewed changes

AmesingFlank changed the base branch from AmesingFlank/stack/31 to main May 1, 2026 15:49

AmesingFlank merged commit 97acc41 into main May 1, 2026
20 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small attention optimization: pre-scale q tile with qk_scale#2157

Small attention optimization: pre-scale q tile with qk_scale#2157
AmesingFlank merged 2 commits intomainfrom
AmesingFlank/stack/32

AmesingFlank commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AmesingFlank commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AmesingFlank commented Apr 30, 2026 •

edited

Loading