Problem
A sequence of N consecutive cb_wait ops on the same DFB lowers to N separate cb_wait_front(1) / cb_pop_front(1) calls. The buffer exposes a single FIFO front pointer, so the auto-pop pass must place cb_pop ops in the same op order it observes consumer uses; out-of-declaration-order consumes (consumer reads tile 2 before tile 1) cannot be expressed without violating FIFO monotonicity of the front pointer.
Reproducer
test/python/test_auto_pop_push.py::test_reordered_consumes_violate_fifo_xfail (branch bnorris/auto-push-pop-fixes).
@pytest.mark.xfail(strict=True). Flips to PASS once this lands.
Proposed change
Detect strictly-consecutive acquires on the same DFB sync class (same block, only attach_cb / arith constants between them) and lower the group as one cb_wait_front(N) plus per-acquire src_idx in [0, N) plus one cb_pop_front(N). Each acquire then addresses its tile by index, decoupling consume order from release order.
TableGen extension: optional num_tiles and tile_offset attributes on CBWaitOp / CBPopOp (and their producer-side mirrors CBReserveOp / CBPushOp). num_tiles=0 marks a subsumed acquire whose cb_wait_front is elided; the SSA result still anchors downstream src_idx indexing.
Lowering: ConvertTTLToTTKernel.cpp reads the new attributes; ConvertTTLTileOpsToTTKernel.cpp folds tile_offset into flatSrcIndex. Independent of #555, but the unified ownership simplifies the coalescing analysis.
Related: #536.
Problem
A sequence of
Nconsecutivecb_waitops on the same DFB lowers toNseparatecb_wait_front(1)/cb_pop_front(1)calls. The buffer exposes a single FIFO front pointer, so the auto-pop pass must placecb_popops in the same op order it observes consumer uses; out-of-declaration-order consumes (consumer reads tile 2 before tile 1) cannot be expressed without violating FIFO monotonicity of the front pointer.Reproducer
test/python/test_auto_pop_push.py::test_reordered_consumes_violate_fifo_xfail(branchbnorris/auto-push-pop-fixes).@pytest.mark.xfail(strict=True). Flips to PASS once this lands.Proposed change
Detect strictly-consecutive acquires on the same DFB sync class (same block, only
attach_cb/ arith constants between them) and lower the group as onecb_wait_front(N)plus per-acquiresrc_idxin[0, N)plus onecb_pop_front(N). Each acquire then addresses its tile by index, decoupling consume order from release order.TableGen extension: optional
num_tilesandtile_offsetattributes onCBWaitOp/CBPopOp(and their producer-side mirrorsCBReserveOp/CBPushOp).num_tiles=0marks a subsumed acquire whosecb_wait_frontis elided; the SSA result still anchors downstreamsrc_idxindexing.Lowering:
ConvertTTLToTTKernel.cppreads the new attributes;ConvertTTLTileOpsToTTKernel.cppfoldstile_offsetintoflatSrcIndex. Independent of #555, but the unified ownership simplifies the coalescing analysis.Related: #536.