You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sharing two patterns observed in a recent multi-plan phase, in case they're useful signal for the framework. Not framing as bugs — they may be fundamental limits of static plan analysis, or they may be enhancement-shaped. Wanted to gauge maintainer interest before formalizing.
Pattern 1: cross-plan contract drift
In a 5-plan phase with parallel = true, every plan had per-plan unit tests that passed. Plan-checker passed. Wave execution looked clean. At the end-to-end verification step, six runtime defects surfaced — and the dominant pattern (six of eight) was cross-plan I/O contract drift:
Plan A's producer emitted output keyed keyA/keyB; Plan B's consumer read keyC/keyD.
Plan A emitted nested structure ({outer: {inner: ...}}); Plan B read the inner key flat at the top level.
Plan A wrote field as a string; Plan B read field.subfield.
Each plan's unit tests used synthetic fixtures matching that author's mental model. Both passed in isolation. The fixtures did not match each other.
Meta-pattern: plan-checker verifies each plan against its own goal but not the cross-plan I/O contracts that emerge when plans have producer/consumer relationships. Per-plan unit tests with mock fixtures are insufficient — they validate within an author's mental model, not across two authors' mental models.
Possible mitigation surface (thinking out loud, not yet a feature request): when plan-checker sees Plan B reading from a path that Plan A writes, could it require one of them to ship a contract test — a real fixture owned by the producer that the consumer's tests load? The contract becomes a checked-in artifact, not two unsynchronized prose specs.
Questions for maintainers:
(a) Has anyone else seen this on multi-plan phases? What did you do about it?
(b) Is there an idiomatic GSD pattern for declaring a producer→consumer contract that already exists and we missed?
(c) Do maintainers consider this in-scope for plan-checker, or is it explicitly out (e2e is where this lands by design)?
Pattern 2: unanchored quantitative thresholds in VALIDATION.md
Separately: a VALIDATION.md row encoded a numerical assertion ("≥60% reduction on metric X"). Plan-checker accepted it. At verification time, the measured value was 30% on the test fixture. The 60% number was speculative against an imagined production workload, and the test fixture's surface area physically capped the achievable value below 60%. We had to recalibrate to a fixture-specific floor (15%) and document the original 60% as deferred.
Meta-pattern: plan-checker has no way to flag "this number has no measurement behind it." Numbers in VALIDATION.md look like pass/fail gates but some are aspirational and others are anchored to a measurement — and they're indistinguishable in the artifact.
Possible mitigation: any quantitative threshold in VALIDATION.md without an explicit reference to a measurement source (or a # speculative — verify in execution tag) gets a soft warning from plan-checker. Doesn't block; just surfaces.
Same question: in-scope for plan-checker, or is the answer "discipline at planning time, not tooling"?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Sharing two patterns observed in a recent multi-plan phase, in case they're useful signal for the framework. Not framing as bugs — they may be fundamental limits of static plan analysis, or they may be enhancement-shaped. Wanted to gauge maintainer interest before formalizing.
Pattern 1: cross-plan contract drift
In a 5-plan phase with
parallel = true, every plan had per-plan unit tests that passed. Plan-checker passed. Wave execution looked clean. At the end-to-end verification step, six runtime defects surfaced — and the dominant pattern (six of eight) was cross-plan I/O contract drift:keyA/keyB; Plan B's consumer readkeyC/keyD.{outer: {inner: ...}}); Plan B read the inner key flat at the top level.fieldas a string; Plan B readfield.subfield.Meta-pattern: plan-checker verifies each plan against its own goal but not the cross-plan I/O contracts that emerge when plans have producer/consumer relationships. Per-plan unit tests with mock fixtures are insufficient — they validate within an author's mental model, not across two authors' mental models.
Possible mitigation surface (thinking out loud, not yet a feature request): when plan-checker sees Plan B reading from a path that Plan A writes, could it require one of them to ship a contract test — a real fixture owned by the producer that the consumer's tests load? The contract becomes a checked-in artifact, not two unsynchronized prose specs.
Questions for maintainers:
Pattern 2: unanchored quantitative thresholds in VALIDATION.md
Separately: a VALIDATION.md row encoded a numerical assertion ("≥60% reduction on metric X"). Plan-checker accepted it. At verification time, the measured value was 30% on the test fixture. The 60% number was speculative against an imagined production workload, and the test fixture's surface area physically capped the achievable value below 60%. We had to recalibrate to a fixture-specific floor (15%) and document the original 60% as deferred.
Meta-pattern: plan-checker has no way to flag "this number has no measurement behind it." Numbers in VALIDATION.md look like pass/fail gates but some are aspirational and others are anchored to a measurement — and they're indistinguishable in the artifact.
Possible mitigation: any quantitative threshold in VALIDATION.md without an explicit reference to a measurement source (or a
# speculative — verify in executiontag) gets a soft warning from plan-checker. Doesn't block; just surfaces.Same question: in-scope for plan-checker, or is the answer "discipline at planning time, not tooling"?
Setup
Happy to formalize either of these as enhancement proposals if maintainers signal interest. Filing as Discussion first per CONTRIBUTING.md guidance.
Beta Was this translation helpful? Give feedback.
All reactions