Skip to content

Development happens here#51

Merged
msoos merged 152 commits intomasterfrom
develop
Apr 30, 2026
Merged

Development happens here#51
msoos merged 152 commits intomasterfrom
develop

Conversation

@msoos
Copy link
Copy Markdown
Collaborator

@msoos msoos commented Apr 19, 2026

No description provided.

msoos and others added 30 commits April 19, 2026 15:46
Hot AIG DFSs (get_dependent_vars, count_aig_nodes) were spending
~10% of total cycles in hashtable insertions and the associated
malloc/free. Switch to a mutable visit_epoch on AIG, bumped once
per traversal and compared as an integer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Package windows release too
Replaces the naive pairwise Tseitin visitor with the AIGToCNF encoder
(k-ary AND/OR fusion, ITE pattern detection, De Morgan flattening,
dedup/constant folding) — the same encoder the rebuild path already uses.
The y_hat remap now lives in the AIG so the encoder can consume raw
lit vars, and orig.sign() is baked into f.aig up front.
The sink isn't cex-specific: it routes AIGToCNF output into a per-formula
clause list while allocating helper vars from the solver.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewire encode_node() to collect k-ary AND/OR operands as aig_ptrs and run
structural simplification (constant fold, dedup, complementary-pair, and
absorption: AND(x, OR(x,...))=x / OR(x, AND(x,...))=x) before committing
to leaf literals. Catches shared sub-AIG patterns that the lit-level
dedup can't see because a sub-AIG and its NOT-wrapper have different
helpers.

Add pure-chain generators to the fuzzer (linear AND/OR chain, balanced
AND/OR tree) and a dedicated test-aig-to-cnf regression that asserts
chains up to width 256 fuse into a single k-ary gate (n+1 clauses,
1 helper). Hook into ctest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two complementary encoder improvements:

1. MUX3 fusion for nested ITEs. When the else branch of an ITE pattern
   is itself a fanout-1, uncached ITE, emit a single 6-clause MUX3
   gate with one helper instead of two 4-clause ITEs with two
   helpers. Each fusion saves 2 clauses and 1 helper. parse_ite_at
   splits the pattern-match out of try_ite so the outer match can
   inspect the inner AIG without committing to an encoding.

2. Canonical (var, sign) sort on group-CSE keys before lookup/insert.
   Previously CSE hits depended on whichever order the inputs
   arrived in, which only coincided with canonical order when
   normalize_inputs was enabled.

On the fuzzer's mixed workload (deep ITE chains, DNF covers, big
AND/OR chains) clause reduction improves from ~66% to ~69% and
helper reduction from ~78% to ~83% vs naive Tseitin, with MUX3
firing ~36 times per iteration on the manthan-shaped ITE chains.
All 500/500 fuzzer iterations pass equivalence + brute-force
truth-table checks; test-aig-to-cnf still passes all pure-chain
cases.
Track polarity of each helper via a pre-pass from the root, then emit
only the forward or reverse half of each helper's biconditional based on
whether it is consumed positively, negatively, or both. Pre-pass mirrors
the flattening used by the main encoder (k-ary AND, k-ary OR via NAND,
ITE, MUX3) so size-1 reductions and width-capped chunks inherit the
correct polarity. Group CSE is forced off under PG (reused helpers would
need both halves anyway).

Fuzzer now randomises PG per iteration by default (--pg / --no-pg force
a mode). 10000-iter random-PG run passes; clause reduction 82% on the
mixed workload vs 80% without PG.
msoos and others added 28 commits April 27, 2026 23:11
Silences a -Wnull-dereference warning at aig_rewrite.cpp:331. The lambda
guards (e.node && ...) on lines 320/322 confused GCC's flow analysis into
believing r.node could be null on the fast path. Document the actual
invariant (t_and nodes always have non-null children, see AIG::new_and)
with an assert; this also gives the compiler enough information to drop
the warning.
Replaces the success-only "def found" line with a per-variable summary
emitted at the end of every test (hit or miss): iters, miter U/S/T,
f-solver U/S/T, skip_big, costzero count, avg pattern size, AIG nodes,
and a `result=` field naming the termination reason (found, iter_limit,
miter_undef, miter_pin_undef, f_undef, costzero_limit). Columns are
fixed-width via setw so consecutive lines line up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Explains why pinning vs no-pinning are both over-approximations of
Skolem-validity, lists 2QBF backends (CADET, DepQBF, CAQE, ...), and
sketches a per-variable 2QBF-CEGAR alternative.
The visitor was always invoked with neg=false because transform applies
the outer edge sign itself via ~result. Every visitor's `neg ? ~x : x`
branch was therefore dead. Strip the parameter from the signature and
collapse each call site to its always-taken branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wraps the iter1 simplification strategy string in a switch over an
integer enum; 0 keeps the existing default, 1 is the alternate
'new model' string previously kept commented out next to it.
Exposed on the CLI as --puurastrategy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a per-test "aux" leaf set (other to-define and backward-defined
vars whose recursive deps don't reach `test`) so the candidate H can
disambiguate F-bifunctional X*. This collapses many costzero_limit
failures into real defs by giving the F-only call a richer assumption
set; conflict cores then yield patterns over (X, aux) instead of being
declared cost-zero.

New knob --unatedefrepaux N (0=input only, 1=+backward_defined,
2=+to-define / default). On genbuf7b4n with 500/500/20 budgets the
pass goes from 4 hits / 30 costzero to 27 hits / 6 costzero.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After the introduction of non-input H leaves (commit ce2eb75), the rep
pass could commit a wrong Skolem when an earlier H_prev used the
current test's var as an aux leaf.

Root cause: locking indicator-prev = TRUE after a commit, combined with
the Y- and Y'-side H-commits, forced y_test_Y = y_test_Y' on every
later miter call where t ∈ aux_prev. The miter then declared UNSAT for
spurious chain reasons, producing incorrect commits.

Fix: stop locking indicator-prev. Instead, assume indicator_test =
FALSE in base_assumps for the current test, so y_test_Y != y_test_Y'
holds in every miter call, keeping the SAT/UNSAT decision aligned with
the soundness condition. The existing Y/Y'-side H-commits already pin
y_prev to its committed H on each side, making the indicator lock
redundant for input-only / backward-only H's anyway.

Also adds a SLOW_DEBUG-gated post-commit check via
check_synth_funs_sat() that catches bad commits at the exact iteration
that introduced them, and VERBOSE_DEBUG dumps of aux selection,
patterns, and the H AIG. Both fuzzers (fuzz_synth.py,
fuzz_unate_def_rep.py) now also randomize --unatedefrepaux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_check only inspected stdout for the CORRECT line and never looked at
the process exit code, so segfaults and SIGABRTs from release_assert in
test-synth were silently swallowed (especially on non-final AIGs, where
the missing-CORRECT branch is skipped entirely). Now any non-zero exit
that isn't a clean INCORRECT (returncode 1) is reported as a bug, an
INCORRECT match in the output is reported explicitly, and all bug paths
print the command and the --seed reproduce line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both non-input-leaf modes added in ce2eb75 fail under fuzzing:

  - aux=2 (undefined to-define leaves) violates the
    check_pre_post_backward_round_synth invariant. The committed def of
    `test` reaches a leaf that's neither orig-sampl nor itself defined,
    tripping the assertion when test-synth loads
    `*-unsat_unate_def_rep.aig` (seed 1920751642475315400).

  - aux=1 (backward-defined leaves) breaks Manthan's BW-vars-have-unique-
    defs assumption. unate_def_rep commits Skolem H, not unique-defining
    functions: the miter UNSAT only guarantees that *replacing* y_test
    with H(input, y_aux) keeps F sat, not that y_test = H in every F-sat
    model. With aux=1, H(input, y_hat_aux) can pick a valid F-sat value
    of y_test that disagrees with y[test] in the specific F-sat model
    the CEX produced, leaving y[test] in needs_repair after
    find_better_ctx. find_next_repair_var then picks a backward_defined
    var as y_rep and trips the BW assertion at manthan.cpp:1569 (seed
    15223131052712053446).

Restoring full aux>=1 needs Manthan to support repair on Skolem-defined
BW vars. Until then clamp to 0 with a documented stub so the knob keeps
parsing for compatibility.

After fix: fuzz_synth.py --num 1500, fuzz_unate_def_rep.py 300,
fuzz_aig_to_cnf --num 1000, fuzz_aig_rewrite --num 1000 all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
unate_def_rep with aux>=1 had two soundness gaps that the recent fuzzer
(--num 1500) found:

1. The miter UNSAT only proves H is a *Skolem*: F-sat (x, y) implies
   exists F-sat (x, y') with y'[test] = H(input, y_aux). It does NOT
   prove uniqueness (F ⊨ y_test = H). When F is bifunctional, F-sat
   models can have y_test != H. Manthan's BW pipeline assumes
   y_hat[BW] = y[BW] in every F-sat model (find_better_ctx fixes BW
   vars by finding F-sat with y[v] = y_hat[v]); a Skolem-only commit
   can leave a BW var in needs_repair after find_better_ctx, tripping
   the assertion at find_next_repair_var (manthan.cpp:1569). Fix:
   after the existing miter UNSAT, run an extra check in f_solver
   (F-only, no commit clauses) that `F + y_test != H_top` is UNSAT.
   Only commit when uniqueness holds; otherwise mark Skolem-only and
   continue refining.

2. When H_test happens to be a constant or input-only AIG, get_var_types
   categorized it as extend_defined (which Manthan treats as input,
   dropping the y_test = H_test constraint). Subsequent commits whose
   miter UNSAT relied on the dropped constraint then produced wrong
   y_hat in Manthan, also tripping the BW assertion. Fix: track
   unate_def_rep commits in a `skolem_defined_vars` set on
   SimplifiedCNF (set via new set_def_skolem entry point) and force
   get_var_types to keep them in backward_synth_defined_vars regardless
   of structural deps. The set is round-tripped through the AIG dump
   format so test-synth's check_pre_post_backward_round_synth can also
   skip its strict invariant for these vars.

After fix: fuzz_synth.py --num 1500, fuzz_unate_def_rep.py 300
(rep_fired=25, all verified), fuzz_aig_to_cnf --num 1000, and
fuzz_aig_rewrite --num 1000 all pass with default --unatedefrepaux=2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document and self-check unate_def_rep's invariants so a future regression
fails fast at the introduction site, not at a downstream consumer:

  - Cheap asserts (always on, per test loop iteration):
      * test is a real, non-input, not-yet-tested var with an indicator
      * aux candidates are non-input, non-self, distinct, aux_mask agrees
      * pattern conflict literals live in input ∪ aux (no leak from the
        f_solver core through the filter)
      * before commit: H is non-null, target var has no def yet, and
        every direct AIG leaf of H is in input ∪ aux
      * after commit: defs[test] is set and is_skolem_defined(test) holds

  - SLOW_DEBUG checks (heavy, opt-in via constants.h):
      * per-commit defs_invariant() in addition to the existing
        check_synth_funs_sat() — catches structural breakage (cycles,
        dangling deps, bad categorization)
      * end-of-pass defs_invariant() so the pass boundary is a clean
        verification point even when no commit fired

Add an is_skolem_defined() accessor on SimplifiedCNF for the asserts.

Widen fuzzer corner coverage: --unatedefrepiters now also exercises 0
(inner loop never runs — no commits at all) and 10000 (stress
refinement); --unatedefrepconfl in fuzz_unate_def_rep.py now also
includes 1 (every solver call mostly times out). All other knobs already
covered 0/1/medium/high/very-high. Skolem-only-skipped count is now
printed in the end-of-pass stats line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two narrow self-checks for the skolem_defined_vars set, placed where
they document the invariant they protect:

  - defs_invariant() (always-on, since it's already mostly SLOW_DEBUG-
    gated by callers): every entry in skolem_defined_vars points at a
    valid, currently-defined, non-orig-sampl var. This catches
    set_def_skolem misuse or a copy/move that desyncs the set from
    `defs`.

  - get_var_types() at the end (SLOW_DEBUG_DO, since this function is
    on the hot path): no Skolem-committed var was placed into
    extend_defined_vars. The whole point of the Skolem flag is to keep
    such vars in backward_synth_defined_vars even when their AIG looks
    input-only or is constant — extend-defined gets treated as input by
    Manthan, dropping the y_test = H_test commit constraint. If the
    `&& !skolem_defined_vars.count(orig)` branch in the categorization
    above ever gets removed/broken, this fires immediately rather than
    surfacing as a Manthan BW assertion downstream.

Verified with SLOW_DEBUG-enabled build + fuzz_unate_def_rep.py 50.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The uniqueness check at the miter-UNSAT branch was previously run against
a static `f_solver` built once from original F. That made it strict
relative to what's actually needed: the BW pipeline only requires
y_test = H to be implied by the *current* CNF (= F plus all prior
commits), not by original F. A later var's H can be Skolem-only in F
but uniquely-defining in cumulative F' once prior commits restrict the
model space; the static check rejected those chained-Skolem cases and
they exhausted the iter budget instead of committing.

Fix: after a successful commit on the s-side, mirror the same
y_test ⇔ H_top equivalence into f_solver. h_top_in_f and y_test_in_f
are still in scope from the uniqueness check above; the Tseitin chain
for H is already in f_solver from that call, so the mirror is just two
clauses. Subsequent uniqueness checks then operate against cumulative F'
without any soundness loss — the new clauses are exactly the def we
already verified is implied by F under the F-only check.

Also update the now-stale header doc (step 3b said "UNSAT → valid
Skolem; commit" but the code does a uniqueness check; step 5 didn't
mention the f_solver mirror) and the in-block comment that described
f_solver as "no commit clauses".

While here, the verb-1 per-var print line drops the redundant orig-var
column (already in the SLOW_DEBUG path) and uses COLCYN/COLDEF for the
stop_reason, matching the other [unate_def_rep] verb-1 lines.

Verified: fuzz_synth 1500, fuzz_unate_def_rep 300 (42/42 reps verified),
fuzz_aig_to_cnf 1000, fuzz_aig_rewrite 1000 all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the strict uniqueness check at the miter-UNSAT branch with a
feasibility check, and materialize the committed equivalence into
cnf.get_clauses() so downstream consumers (find_better_ctx_normal in
particular) see the def directly.

Why: on bifunctional benchmarks like factorization12, every var has
multiple valid Skolem witnesses for the same input X, so strict
uniqueness in F (or in cumulative F') can never hold and the previous
pass committed nothing. The looser check accepts a wider class of Hs.

The feasibility check is `F' ∧ (y_test = H_top)` SAT in f_solver. SAT
means H is at least sometimes a valid value for y_test in cumulative F'
— in particular it rules out the vacuous-miter-UNSAT case where no
F'-model has y_test = H at any X (which would silently lose X-
projections on commit). f_solver continues to accumulate every prior
commit's y_v ⇔ H_v so the check is against cumulative F', not original F.

Caveat: feasibility SAT is weaker than full Skolem (∀X. F-sat at X →
∃Y with y_test = H(X)). A pathological F that splits into two disjoint
input regions with disjoint forced y_test values can pass feasibility
but lose an X-projection on commit. The fuzzers did not surface such a
case in 1500 iterations, but the gap is real and a future-work item is
to replace the feasibility check with a CEX-loop Skolem check.

The materialization side: each successful commit's (test, H) gets
enqueued in `deferred_materialize` and processed after the per-test
loop. We can't materialize in the loop because cnf.new_var() would
shift cnf.nVars(), and that offset is the anchor for the Y'-side
encoding in the miter solver `s` (and in encode_h). Deferring is sound
because the materialization is only needed by find_better_ctx_normal,
which runs inside Manthan after synthesis_unate_def_rep returns.

Each AND helper allocated during materialization is committed via
set_def_skolem (not set_def) so:
  - get_var_types keeps it in backward_synth_defined even when its
    dep-chain is all input (a pure-input H produces such helpers);
    extend_defined would let Manthan treat it as input and drop the
    helper = AND constraint we just baked in;
  - check_pre_post_backward_round_synth's "vars in CNF must be defined
    in terms of orig_sampl_vars" invariant explicitly exempts Skolem-
    marked vars, which is necessary for chained-Skolem H's whose aux
    leaves are themselves non-orig-sampl backward-defined vars.

The Tseitin chain for H is also mirrored into f_solver on commit
(carried over from the previous commit), so the next var's feasibility
check operates against cumulative F' rather than original F — chained-
Skolem commits compose cleanly inside the loop.

Comments updated throughout (header step 3b/5, in-block comments at
the check site, the post-set_def_skolem block, and the f_solver mirror
block) to describe the new flow.

Verified: factorization12 commits 1 def in unate_def_rep (was 0),
synth completes; previously-failing fuzz_synth seed 14130824211102645510
now passes; fuzz_aig_to_cnf 1000, fuzz_aig_rewrite 1000,
fuzz_unate_def_rep 300 (44/44 reps verified), fuzz_synth 1500 all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Counts wall-clock time and call counts for the major per-iteration
phases (encode_h, miter solve, uniqueness encode/solve, F-only solve,
pattern build, commit) so we can attribute cost to specific steps.

On the slow benchmarks (amba3b5y, sdlx-fixpoint-5, usb-phy) the breakdown
shows the miter SAT solve at 83-91% of total time, with f_solve and
encode_h each in the 1-7% range — a useful map for further targeted
optimization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Trim unate_def_rep timing instrumentation to the 3 SAT calls

cpuTime() in the per-iter hot path was eating measurable time itself.
Drop time_setup, time_per_test_setup, time_encode_h, time_feas_encode,
time_pattern, time_commit. Keep miter/feas/f solve timings (the actual
expensive work) plus all the cheap integer counters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fc_int discarded std::from_chars's return value, so a non-numeric
argument like --unatedefrepaux stuff silently parsed as 0 instead of
throwing. Now check the errc and trailing-character position. fc_double
gets a matching invalid_argument message so failures point at the
offending input rather than a bare 'stod'.
@msoos msoos merged commit ece5005 into master Apr 30, 2026
18 checks passed
@msoos msoos deleted the develop branch April 30, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant