Skip to content

[ttl] Coalesce consecutive cb_wait acquires into multi-tile cb_wait_front(N) with per-acquire src_idx #556

@brnorris03

Description

@brnorris03

Problem

A sequence of N consecutive cb_wait ops on the same DFB lowers to N separate cb_wait_front(1) / cb_pop_front(1) calls. The buffer exposes a single FIFO front pointer, so the auto-pop pass must place cb_pop ops in the same op order it observes consumer uses; out-of-declaration-order consumes (consumer reads tile 2 before tile 1) cannot be expressed without violating FIFO monotonicity of the front pointer.

Reproducer

test/python/test_auto_pop_push.py::test_reordered_consumes_violate_fifo_xfail (branch bnorris/auto-push-pop-fixes).

@pytest.mark.xfail(strict=True). Flips to PASS once this lands.

Proposed change

Detect strictly-consecutive acquires on the same DFB sync class (same block, only attach_cb / arith constants between them) and lower the group as one cb_wait_front(N) plus per-acquire src_idx in [0, N) plus one cb_pop_front(N). Each acquire then addresses its tile by index, decoupling consume order from release order.

TableGen extension: optional num_tiles and tile_offset attributes on CBWaitOp / CBPopOp (and their producer-side mirrors CBReserveOp / CBPushOp). num_tiles=0 marks a subsumed acquire whose cb_wait_front is elided; the SSA result still anchors downstream src_idx indexing.

Lowering: ConvertTTLToTTKernel.cpp reads the new attributes; ConvertTTLTileOpsToTTKernel.cpp folds tile_offset into flatSrcIndex. Independent of #555, but the unified ownership simplifies the coalescing analysis.

Related: #536.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions