Skip to content

Small attention optimization: pre-scale q tile with qk_scale#2157

Merged
AmesingFlank merged 2 commits intomainfrom
AmesingFlank/stack/32
May 1, 2026
Merged

Small attention optimization: pre-scale q tile with qk_scale#2157
AmesingFlank merged 2 commits intomainfrom
AmesingFlank/stack/32

Conversation

@AmesingFlank
Copy link
Copy Markdown
Contributor

@AmesingFlank AmesingFlank commented Apr 30, 2026

Stacked PRs:


Small optimization, discovered by @cota. Here, q is of size (block_q, head_dim) which is smaller than qk, which is (block_q, block_k). Typically, head_dim is 64/128 which is likely smaller than block_k. So this results in slightly less ALU pressure.

On TPUs, with

export LIBTPU_INIT_ARGS="--xla_tpu_dvfs_p_state=7 --xla_tpu_scoped_vmem_limit_kib=65536"

B, H, S, D = 8, 32, 8192, 128,

block_sizes = [2, 1024, 1024]

This optimization provides a 1~2 TFLOPs increasing, from a base line of 290 TFLOPs.

It's not much, but it's something.

stack-info: PR: #2156, branch: AmesingFlank/stack/31
stack-info: PR: #2157, branch: AmesingFlank/stack/32
@AmesingFlank AmesingFlank force-pushed the AmesingFlank/stack/31 branch from b3a887c to d4de5d5 Compare April 30, 2026 22:36
@AmesingFlank AmesingFlank force-pushed the AmesingFlank/stack/32 branch from 0a7a1a9 to 5625c40 Compare April 30, 2026 22:36
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 30, 2026
@AmesingFlank AmesingFlank changed the base branch from AmesingFlank/stack/31 to main May 1, 2026 15:49
@AmesingFlank AmesingFlank merged commit 97acc41 into main May 1, 2026
20 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants