Planner completeness: flip *use-planner* default-on by whilo · Pull Request #18 · replikativ/stratum

whilo · 2026-05-08T01:38:20Z

Summary

Brings the IR query planner (stratum.query.executor / plan / prepare) to test parity with the legacy cond dispatcher in stratum.query/q, flips *use-planner* to true by default, and lands the ten audit follow-ups (F1–F10) — including post-join handling of anomaly scoring and string-producing expressions, the last two paths that used to fall back to the legacy body. With this PR *use-planner* true is the only runtime path; there are no escape hatches left in q. 868 tests / 4119 assertions green.

Commits

Initial planner flip (8 commits): prepare.clj shared frontend, planner rewrites (predicate pushdown, top-N, window-having pushdown, etc.), executor wiring, default flip.
F1–F8 follow-ups (1 commit): LSetOp dispatch, idempotent normalize-pred, float :gte/:lte selectivity, :int64 materialize-expr target, PFusedExtractCount emission, string predicate sampling, NDV from chunk stats, NDV-based join cardinality.
F9–F10 follow-ups (1 commit): post-join LAnomaly + LStringMaterialize IR records — anomaly scoring and string-producing expressions in :group/:select/etc. now run end-to-end in the planner over a join. Drops every legacy-fallback condition in query.clj.

Audit follow-ups (resolved in this PR)

#	Item	Resolution
F1	`LSetOp` execute-node dispatch	Added case using `requiring-resolve` to re-enter `q` for sub-queries; debug entry points (`compile-physical`, `explain-query`) no longer crash on UNION/INTERSECT/EXCEPT.
F2	`window-having-pushdown` brittleness	`normalize-pred` is now idempotent (`normalized-pred?` short-circuits already-normalized form); HAVING is normalized at `build-logical-plan` time (always, since `prepare-query`'s `pre-normalized?` flag only covers `:where`/`:agg`); redundant re-normalizations dropped.
F3	Float `:gte`/`:lte` selectivity overshoot	New `zone-map-estimate-gte` / `-lte` with inclusive-bound tests; the dispatcher uses these instead of the int-style `(dec t)` / `(inc t)` reduction.
F4	`PMaterializeExpr` long-target	Group-by exprs emit `:int64` target; executor calls `eval-expr-to-long`, skipping the long → double → long round-trip.
F5	`PFusedExtractCount` emission	New `try-fused-extract-count` recognises `EXTRACT(unit, col) GROUP BY → COUNT` and emits `PFusedExtractCount` (executor dispatches to `ColumnOpsExt/fusedExtractCountDenseParallel`); closes the CB-Q19 10× gap.
F6	String predicate sampling	256-entry dict / `String[]` sampling for `:like`, `:ilike`, `:contains`, `:starts-with`, `:ends-with` (and negations); SQL LIKE → regex via `like->regex`.
F7	NDV from chunk stats	New `estimate-ndv`: dict-encoded → `dict.length`; indexed int64 → `min(length, max-min+1)`; fallback `length/10`.
F8	NDV-based join cardinality	`propagate-est-rows` for `PHashJoin` uses the textbook `probe_rows × build_rows / max(probe_ndv, build_ndv)` on the first join key; falls back to the legacy heuristic for multi-key / wrapped sides.
F9	Anomaly + join in planner	Split `resolve-anomaly-columns` into `anomaly-spec` (pure rewrite) + `materialize-anomaly` (runtime scoring). New `LAnomaly` IR node placed after `LJoin`; executor case scores against the post-join column ctx. `column-pruning` walks anomaly expression args + model `:feature-names` so the pre-join scans keep the right columns.
F10	Post-join string-expr materialization	New `string-expr-spec-group-agg` / `-select` rewrite group/aggs/select to `__str_expr_N` synthetic refs without calling `eval-string-expr`. New `LStringMaterialize` IR node placed after `LAnomaly`; executor runs `eval-string-expr` per item against the post-join column ctx. `column-pruning` walks the items' expressions. The brittle `try/catch normalize-expr` gate in `query.clj` is gone — the planner handles every shape now.

Performance

T1+T2 olap bench, 6M rows, 8-core Lunar Lake (planner-on, all 26 PASS / 0 FAIL):

Query	1T	8T	DuckDB 8T	Δ vs DuckDB 8T
B1 sum-product	17.2	11.0	5.4	0.49×
B2 TPC-H Q1	134.2	90.5	17.8	0.20×
B3 SSB Q1.1	17.0	8.5	5.5	0.65×
B5 filtered count	3.1	2.7	2.5	0.93×
B6 group-by-cnt	19.8	11.1	4.6	0.41×
H2O-Q6 STDDEV	39.2	29.6	26.1	0.88×
H2O-Q9 CORR	64.4	50.9	33.4	0.66×
H2O-Q8 (window TOP-N)	955	549	184	(5× planner-on→legacy)
H2O-Q10 (6 cols, 6M groups)	600	469	3939	8.4× faster
CB-Q19 (extract+count via PFusedExtractCount)	4.4	–	–	(parity vs legacy)
SEMI-Q1	18.6	6.5	86.8	13.3× faster
SEMI-Q3	29.4	10.3	72.5	7.0× faster
H2O-J1/J2/J3 (joins)	30/35/42	10/20/21	6/19/19	0.61/0.96/0.88×
H2O-Q3/Q7 (high-card group)	75/58	86/57	110/146	1.3/2.6×
H2O-Q4/Q5 (multi-AVG/SUM)	92/79	67/82	8/87	0.12/1.05×

vs. the legacy path (planner-off, same JVM, same data): all queries within ±10% on standard shapes; planner faster on bitmap semi-joins (3-13×), window TOP-N (1.35×), and now also on the EXTRACT-COUNT and the post-join anomaly + string-expr paths.

Coverage

The planner now handles every shape q exposes:

WHERE / aggregates / GROUP BY / HAVING / ORDER BY / LIMIT / OFFSET / DISTINCT / SELECT / SELECT *
INNER / LEFT / RIGHT / FULL joins with single- and multi-column keys
ASOF joins
Window functions (row_number, rank, lag, lead, running-sum, etc.) with HAVING pushdown
UNION / INTERSECT / EXCEPT
Top-N pushdown (LIMIT ≤ 1024 over ORDER BY + scan/project)
COUNT DISTINCT, percentile / median / approx-quantile, VARIANCE / STDDEV / CORR
Date/time arithmetic + extracts
LIKE / contains / starts-with / ends-with on dict and raw strings
Bitmap semi-join optimization
Anomaly scoring (ANOMALY_SCORE, ANOMALY_PREDICT, ANOMALY_PROBA, ANOMALY_CONFIDENCE) — with or without join
String-producing expressions (UPPER, LOWER, CONCAT, TRIM, SUBSTR, REPLACE) in GROUP BY / SELECT / aggs — with or without join

There are no remaining legacy fallbacks. q always routes through executor/run-query.

Test plan

clojure -M:test — 868 tests, 4119 assertions, 0 failures
T1+T2 olap bench at 6M rows (planner-on, 26 PASS / 0 FAIL)
T1+T2 olap bench at 6M rows (legacy, via local toggle)
H2O-Q8 result validation (Stratum 120000 rows == DuckDB 120000)
CB-Q19 result validation (PFusedExtractCount produces 60 rows)
CB-Q43 result validation (date-trunc minute group-by, 525590 buckets)
sql-anomaly-join-scoring-test (canonical join + ANOMALY_SCORE)
GROUP BY UPPER(cat) over a join (legacy / planner produce same rows)
clojure -M:ffix (cljfmt)

Phase A of the planner-completeness work. The IR planner was missing an entire frontend pre-processing layer the legacy `q` body has always run inline. Without it, raw expression vectors and unresolved string predicates reached the executor and exploded at `eval-expr-polymorphic` ('Unsupported vectorized expr') or in `prepare-aggregation` ('Cannot load from long array because parameter1 is null') for queries that are routine on the legacy path. Three changes: 1. **`stratum.query.prepare/prepare-query` shared helper.** Lifts the legacy lowering passes into a single module both `q`-legacy and `executor/run-query` can call: a. normalize predicates and aggregates b. pre-materialize string-producing predicate expressions (`LOWER(name) = 'bob'` → `__pred_str_N`) c. pre-materialize numeric predicate expressions (`x + y > 10` → `__expr_N`) d. materialize string predicates (LIKE / CONTAINS) into mask columns e. resolve dict-encoded equality predicates by mapping the right-hand string/keyword to its dict-id f. compile non-SIMD predicates (OR, IN, NOT-IN, :fn) into a single mask column referenced as `[:__mask :eq 1]` g. pre-materialize string-producing exprs in GROUP BY / aggs / SELECT into dict-encoded temp columns Returns `{:preds :aggs :group :select :columns :columns-meta}`. `executor/run-query` now binds `expr/*columns-meta*` from the returned `:columns-meta` so downstream expression eval (windows, `eval-expr-polymorphic`, etc.) sees temp dict-encoded columns. 2. **`build-logical-plan` honors a `::pre-normalized?` flag.** The legacy normalize-{pred,agg,select-item} fns aren't idempotent, so the planner path tells `build-logical-plan` it's already normalized. Plan-internal references to the (now redundant) private `normalize-select-item` were retired in favor of `stratum.query.execution/normalize-select-item`, which is more complete (handles `:as`, literals, expressions, and keywords). 3. **`collect-all-refs` walks every column-bearing slot recursively.** Previously project items only contributed `:ref` (so an item with `:expr` was invisible to column-pruning), single-agg nodes only contributed `:col` (so `:cols` for sum-product and `:expr` for inline expressions disappeared), and group keys with non-keyword shapes weren't recursed into. The new `collect-expr-refs!` helper walks normalized expression maps (`{:op ... :args ... :branches ...}`) and pred-style vectors uniformly. Also `rewrite-expr-group-keys` now normalizes the group-key expression before handing it to `PMaterializeExpr` so `eval-expr-vectorized` sees the `{:op :date-trunc :args ...}` form rather than the raw `[:date-trunc ...]` vector. Effect with `*use-planner*` bound to `true` on the existing test suite: query-test + sql-test + parquet-test: 113 failures (start) → 41 (29 fail + 12 error) The remaining failures cluster on shapes that need substantive new work, not lowering: window functions (~10), top-N pushdown correctness (~9, top-N is currently a legacy-only optimization), anomaly-score / `ANOMALY_*` (~4), CAST edge cases (~6), SQL `COUNT` shape divergence (~3). These are addressed in subsequent commits. Legacy regression check (planner OFF, default): 424 tests / 1444 assertions all pass.

Two correctness gaps surfaced by running the test suite with *use-planner* on: 1. **`:as` alias dropped on COUNT paths.** `PFusedSIMDCount`, `PChunkedSIMDCount`, and `PBlockSkipCount` had no field for the normalized agg, and their executors hard-coded `{:op :count :as nil}` when calling `format-fused-result`. So `SELECT COUNT(*) AS cnt FROM …` returned `:count` instead of `:cnt`, and any test that read the result by alias got nil. Each of the three IR records now carries the agg, and the constructors in `select-global-agg-strategy` thread the normalized first-agg through. The executors prefer the carried agg, falling back to `{:op :count :as nil}` for older callers. 2. **`*columns-meta*` re-bound to `{}` inside `execute-physical`.** `run-query` binds `expr/*columns-meta*` from `prepare-query`'s output so downstream expression eval (e.g. `LENGTH` on a dict- encoded column) sees the dict info. `execute-physical` then shadowed it with `{}`, losing the binding and making string functions return `0.0` instead of computed values. Removed the redundant binding; the var's root value (`{}`) still applies if `execute-physical` is called directly without a prior `binding`. Effect on the test suite with planner ON: 41 failures (after Phase A) → 30 failures (21 fail + 9 error) Fixed: length-function-test, e2e-{simple-count,in,between}, cast-string-to-{double,long}, cast-invalid-string, sql window-having-pushdown alias path. Legacy regression: 424 / 1444 still green.

window so partition keys survive Two structural fixes for SQL window-function queries through the planner: 1. **`execute-window` now handles a column context input.** Previously it only returned a result when the input was already a vector of row maps; for queries without group-by/aggregate (the common `SELECT col, ROW_NUMBER() OVER (...) FROM t` shape) it fell through and returned the input ctx unchanged, so window functions never executed. The new path materializes columns, calls `win/execute-window-functions`, and threads the augmented column map back into the ctx for downstream PProject / PHaving / PSort to consume. 2. **`build-logical-plan` defers `LProject` past `LWindow`** when a window is present. SQL evaluation order is FROM → WHERE → GROUP BY → HAVING → window → SELECT — projecting before window strips `:partition-by` / `:order-by` columns that the window needs. The new ordering applies LWindow first, then LProject; window-output columns (`:as` of each spec) are auto-appended to the select list if the user didn't list them explicitly, and a synthetic select is built when the user wrote no SELECT at all. Mirrors the legacy `q.clj:807-826` injection. Effect on the test suite with planner ON: 30 failures (after Phase B count fixes) → 20 failures (11 fail + 9 error). Window-function-execution-test, window-frame-test, and ntile-percent-rank-cume-dist-test all pass. Remaining: top-N pushdown (12, port to IR) and anomaly model (6, hardcoded to legacy). Legacy regression: 424 / 1444 still green.

in apply-distinct Two small but blocking fixes: 1. **`prepare-query` defers predicate lowering when `:join` is set.** The legacy `q` runs predicate lowering AFTER the join has merged columns, so the WHERE predicate references both sides. The planner ran prepare-query upfront against the left-side `:from` columns only — for `WHERE right.cat = 1` that meant `pred/compile-pred-mask` couldn't resolve `:cat` and emitted code with an `aget` call that wouldn't compile ('More than one matching method found'). prepare-query now skips numeric / string / dict / non-SIMD-mask passes when joins are present; the executor's per-filter `prepare-preds` (executor.clj:56-101) handles the lowering at LFilter execution time, when joined columns are in scope. Predicates are still normalized so `build-logical-plan` sees a consistent shape. 2. **`apply-distinct` canonicalizes -0.0 → +0.0** before hashing. Java's `HashSet<Double>` uses bit-pattern equality, so SQL's `SELECT DISTINCT v` returned `-0.0` and `+0.0` as separate rows on the planner path (legacy path went through a streaming primitive that already canonicalized). The planner now matches. Effect on the test suite with planner ON: 20 failures (after window) → 19 failures (11 fail + 8 error). Fixes: join-with-filter-test, distinct-double-zero-canonicalization-test. Remaining: top-N pushdown (12, port to IR) and anomaly model (6). Legacy regression: 424 / 1444 still green.

Top-N (`ORDER BY col [DESC] LIMIT N`) was a legacy-only fast path: the planner fell through to materialize-and-sort, regressing performance the bugfix branch added. Port the optimization to the IR so the planner matches the legacy on these shapes. Wiring: - New `ir.LTopN` node carrying `[order-spec limit select input]`. No separate physical record — the executor recognizes LTopN directly and delegates to the existing `stratum.query.top-n/execute-top-n` primitive (heap of size N + per-row column fetch from surviving chunks). - New `plan.top-n-rewrite` optimization pass detects `LLimit { input: LSort [single-spec] (LScan or LProject(LScan)) }` with N ≤ `*top-n-limit*` (default 1024), no offset, numeric non-string-dict order column. Runs BEFORE strategy-selection so the LLimit/LSort haven't been converted to PLimit/PSort yet, and BEFORE column-pruning so the LScan keeps every column the surviving rows might project. When LSort sits over an LProject, the project items are absorbed into LTopN's `:select` field (top-N's executor handles row-level projection itself). - `collect-all-refs` walks LTopN: order column + project items, or every scan column for SELECT *. Without this, column-pruning would drop everything except the order key. - `executor/execute-top-n-node` translates the LTopN's normalized shape back into the synthetic query map `top-n/execute-top-n` expects. Effect with `*use-planner*` ON: 19 failures (after distinct fix) → 6 failures (1 fail + 5 error). All 12 top-n-{pushdown-correctness, split-chunk-id} tests pass. Remaining: anomaly model (6 tests, Phase C2). Legacy regression: 424 / 1444 still green.

anomaly+join `[:anomaly-score "model" …]` and friends aren't recognized by `normalize-expr`, so they have to be resolved into synthetic column references *before* any other lowering. The legacy `q` runs `resolve-anomaly-columns` inline; the planner needs the same behaviour. Wiring: - Move `resolve-anomaly-columns` and helpers (`anomaly-ops`, `collect-anomaly-exprs`, `rewrite-anomaly-exprs`, `select-alias-map`) from `stratum.query` into the shared `stratum.query.prepare` ns. They use only `expr` / `norm` / `x` / `iforest`, so the relocation is mechanical. (`stratum.query` could call them via require but already requires `stratum.query.executor`, creating a cycle if executor required query — moving to prepare avoids it.) - `executor/prepare-and-build` runs `resolve-anomaly-columns` before `prepare-query` when the query map carries `:_anomaly-models`. Mirrors the legacy `q` body. Limitation: anomaly + join cannot be resolved before plan time because the iforest features may live on the join's right side. The planner's pre-plan resolution would see only the left-side columns and throw `Column :offset not found in data`. For this shape the `q` dispatch falls back to the legacy path, which resolves anomaly post-join. Documented as a follow-up. Effect with `*use-planner*` ON: 6 failures (after top-N port) → 0 failures (424 / 1444 all pass). Legacy regression: 424 / 1444 still green.

…explain shape Final batch of correctness fixes to take the IR planner the rest of the way to test parity with the legacy `q` body, then flip `*use-planner*` to `true`. - predicate-pushdown: respect outer-join semantics. LEFT preserves left rows on right miss → can't push right-side preds (and symmetric for RIGHT/FULL). Anything we can't push stays above the join. Fixes LEFT JOIN + WHERE-on-right tests that were silently dropping rows. - bitmap-semi-join eligibility: any reference to a build-side column from the parent disqualifies the rewrite (including the join key), since the rewrite discards the right side after building the presence bitmap. Mirrors the `(not has-select?)` clause in the legacy `query.join` gate. - build-join-tree: strip table-qualified namespaces off `:on` pairs (`:t1/a` → `:a`) so they line up with the unqualified column-map keys. Self-join no longer trips on missing columns. - estimate/sample-estimate: skip when args are non-numeric or the column is dict-string. The double-coercing path was throwing on string equality predicates we used to short-circuit. - executor/run-query: thread `::plan/order-only-keys` and `::plan/having-only-keys` through `optimize` (preserving the top-level metadata) and dissoc them from result rows. Matches the legacy `(if (seq _order-only-keys) (mapv #(apply dissoc % …)))`. - executor/explain-query: include `:n-rows` and `:columns` so callers that probe the legacy explain shape keep working. - query/*use-planner*: default flips to `true`. The full 868-test suite (including sqllogictest) is green; A/B vs the legacy path is within noise on TPC-H Q1 (B2) and Q6 (B1) at 6M rows.

H2O-Q8 (Top-N per partition via ROW_NUMBER + HAVING) was 2.7× slower under the planner because `LHaving (LProject (LWindow ...))` had `PProject` materialize all 6M post-window rows before `PHaving` could filter them down to the surviving 120K. Mirror the legacy `q.clj:758-815` window-having pushdown: - new `window-having-pushdown` pass rewrites `LHaving preds (LProject items (LWindow specs in))` → `LProject items (LHaving preds (LWindow specs in))` when the project items are bare column refs and the having predicates only reference columns visible after `LWindow` (window outputs + scan inputs). Predicates are normalized in-place since `LHaving` keeps the user's raw form. - `execute-having` gets a column-context fast path: filter on raw arrays, gather only surviving indices, return a column ctx for the parent `PProject` to finish materializing. H2O-Q8 NT 6M rows: 1908ms → 395ms (4.8× speedup, 1.7× faster than legacy's 683ms). Plan after rewrite: PProject -> PHaving -> PWindow -> PSIMDFilter -> PScan. 868 tests still green.

`prepare.clj` passes 5a/5b materialize string-producing expressions in GROUP BY / aggregates / SELECT into temp columns sized to the PRE-join row count. When a query has both a `:join` and such an expression, the temp columns end up the wrong length once the join runs and the executor reads past the array end (or sees stale data). The planner doesn't have a post-join materialization pass yet, so fall back to the legacy `q` body for this combination, which materializes string exprs at the right point. Symmetric with the existing `:join + :_anomaly-models` fallback. Tracked as a follow-up to lift these passes into a post-join planner stage. 868 tests still green.

Implements the eight follow-ups surfaced by the post-flip planner audit. Every change is paired with the legacy reference it mirrors or the gap it closes; 868 tests / 4119 assertions remain green. F1 — LSetOp executor dispatch Add `LSetOp` case to `execute-node` (uses `requiring-resolve 'stratum.query/q` to avoid the require cycle) so `compile-physical` / `explain-query` callers don't crash on UNION/INTERSECT/EXCEPT queries. The runtime path in `q` already short-circuits set ops; this lets debug entry points share that semantics. F2 — Window-having pushdown brittleness Make `normalize-pred` idempotent: detect already-normalized `[col op & args]` form via `normalized-pred?` and return early. Then move HAVING normalization into `build-logical-plan` (always — `prepare-query`'s `pre-normalized?` flag only covers `:where` / `:agg`) and drop the redundant re-normalization in both the `window-having-pushdown` rewrite and `having-fast-path-on-ctx` in the executor. Verified the H2O-Q8 pushdown still fires after the change (PProject → PHaving → PWindow plan shape preserved). F3 — Float `:gte` / `:lte` selectivity overshoot `estimate.clj` was computing `:gte t` selectivity as `:gt (t-1)`, which is correct for ints but overshoots on doubles (e.g. `mn = 4.5`, `t = 5.0` is mistakenly chunk-fully- passing). Add direct `zone-map-estimate-gte` and `-lte` with inclusive boundary tests and route the dispatcher through them. F4 — `PMaterializeExpr` long-target detection `rewrite-expr-group-keys` now emits `:int64` target (group keys are discrete by definition); `execute-materialize-expr` honors it by calling `eval-expr-to-long`, which returns long[] direct for date-trunc / date-add / extract ops. Skips the long → double → long round-trip the planner inherited and unblocks the dense group-by all-long fast path. CB-Q43 (date-trunc minute → group by) lands at 198ms — within noise of legacy 186ms. F5 — `PFusedExtractCount` emission + executor Port the legacy `q.clj:680-908` fused EXTRACT(unit, col) + COUNT fast path. New `try-fused-extract-count` in `strategy-selection` recognises the post-`expr-materialization` shape (LGroupBy over PMaterializeExpr {:op #{:minute :hour :second :day-of-week}}) and emits `PFusedExtractCount`, bypassing the materialization. Executor case dispatches to `ColumnOpsExt/fusedExtractCountDenseParallel` and decodes per the legacy block. Closes the CB-Q19 10× gap. F6 — String predicate sampling Replace static heuristics (0.05 / 0.10) for `:like`, `:ilike`, `:contains`, `:starts-with`, `:ends-with` and their negations with 256-entry dict (or raw `String[]`) sampling. New `like->regex` compiles SQL LIKE to a `Pattern` (only `%` / `_` are wildcards; everything else is quoted). Wired into `estimate-selectivity` between numeric sampling and the heuristic fallback. F7 — NDV from chunk stats Add `estimate-ndv` to `estimate.clj`. Dict-encoded string → `dict.length`. Indexed int64 → `min(length, max-min+1)` from chunk stats. Otherwise the legacy `length/10` heuristic. Provides a callable distinct-count primitive the rest of the planner can lean on. F8 — NDV-based join cardinality `propagate-est-rows` for `PHashJoin` switches from the degenerate `min(L,R) × selectivity` heuristic to the textbook formula: output = probe_rows × build_rows / max(probe_ndv, build_ndv) on the first join key, falling back to the prior heuristic when the join key column / scan isn't reachable (multi-key chains, wrapped sides). Tightens DP join ordering and dense-vs-hash group-by routing. Bench (T1+T2 olap, 6M rows, planner-on, all PASS): H2O-Q8 NT (window TOP-N) 479ms (was 1908ms before F2) CB-Q19 (extract+count) 4.4ms (planner ≈ legacy) B1/B3/B5/B6 / H2O-Q1..Q10 within ±10% of legacy Bitmap semi-join (Q1/Q3) 3-12× faster than legacy

Closes the last two divergence points between the planner and the legacy `q` body — both queries with `:join` + `:_anomaly-models` and queries with `:join` + string-producing exprs in `:group` / `:select` now run end-to-end through the IR planner. The legacy fallback in `query.clj` is gone; `*use-planner* true` is the only runtime path. F9 — Anomaly + join - prepare.clj split: `anomaly-spec` does pure rewriting (collect every `[:anomaly-* …]` expression, assign synthetic `__<op>_<model>` columns, rewrite `:select`/`:where`/`:having`/ `:order`); `materialize-anomaly` runs the iforest scoring against a column ctx. `resolve-anomaly-columns` is now a one-liner that calls both for the no-join path. - New `LAnomaly` IR record, placed after `LJoin` by `build-logical-plan` when the frontend supplied a spec. `executor.clj` adds an LAnomaly case that calls `materialize-anomaly` against the post-join column ctx and returns a column ctx with the synthetic columns added. - `column-pruning`'s `collect-all-refs` walks each anomaly expression's argument list (long form) or the model's `:feature-names` (short form) so the pre-join scans keep the columns the iforest needs. F10 — String-producing exprs + join - prepare.clj passes 5a / 5b grew deferred siblings: `string-expr-spec-group-agg` and `string-expr-spec-select` rewrite the slots and emit `[{:col-name :__str_expr_N :expr <normalized>}]` items without calling `eval-string-expr`. `prepare-query` returns the items as `:string-items`; the no-join path keeps the legacy eager materialization untouched. - New `LStringMaterialize` IR record. Placed after `LAnomaly` (and `LJoin`) by `build-logical-plan` when items are present. `executor.clj` runs `expr/eval-string-expr` per item against the post-join column ctx. - `column-pruning`'s `collect-all-refs` walks each item's expression so referenced columns survive on the scans. The brittle `try/catch normalize-expr` gate I added earlier in `query.clj` is gone too — the planner handles every shape now, so there's no fallback to choose between. Verified end-to-end: - sql-anomaly-join-scoring-test: passes (was the canonical join + ANOMALY_SCORE test; legacy result matches) - GROUP BY UPPER(cat) over a join: planner returns the same rows as legacy (synthetic `__str_expr_1` column key) - 868 tests / 4119 assertions green

whilo added 11 commits May 7, 2026 14:34

whilo merged commit adcb6e2 into main May 8, 2026
5 of 6 checks passed

whilo deleted the feature/planner-completeness branch May 8, 2026 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planner completeness: flip use-planner default-on#18

Planner completeness: flip use-planner default-on#18
whilo merged 11 commits into
mainfrom
feature/planner-completeness

whilo commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whilo commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Audit follow-ups (resolved in this PR)

Performance

Coverage

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

whilo commented May 8, 2026 •

edited

Loading