Fix/iforest benchmark tests by whilo · Pull Request #10 · replikativ/stratum

whilo · 2026-04-14T00:00:08Z

No description provided.

- Rewrite AUC-ROC to use primitive arrays instead of boxed Clojure vectors (Http 567K rows: 8+ minutes → milliseconds) - Switch ODDS download URLs from broken Stony Brook site to Dropbox mirrors - Add h5py support for MATLAB v7.3 files (http.mat, forestcover.mat) - Auto-download ODDS data when missing via ensure-odds-data - Fix reflection warnings (type hints on features/labels arrays) - Fix mammography AUC threshold (0.80 → 0.70 to match actual performance) - Fix server.clj formatting

Our tree builder stopped splitting immediately when the randomly chosen feature had no variance, while sklearn tries other features first. This caused 7% premature leaf exits on mammography (6 features), trapping 33% of data points in oversized leaves and compressing path length range. Mammography AUC: 0.73 → 0.86 (matches sklearn's 0.87) ForestCover AUC: 0.80 → 0.87 Sample-size stable: AUC 0.86 from size 64 through 2048 (was 0.49 at 2048) Zero impact on scoring performance (only training affected).

…RIBE MODEL Enables end-to-end anomaly detection from any PostgreSQL client without Clojure. Follows BigQuery ML CREATE MODEL syntax. - sql.clj: regex-based parsing for model DDL statements, parse-model-options helper for OPTIONS clause - server.clj: :create-model/:drop-model DDL handlers, extensible model-type-map, fix encoded column unwrapping in resolve-anomaly-expressions - sql_test.clj: parse-level tests for all 4 statements, full end-to-end test (CREATE TABLE → INSERT → CREATE MODEL → SHOW → DESCRIBE → ANOMALY_SCORE → DROP) - Updated docs: anomaly-detection.md, sql-interface.md, README.md

Anomaly scoring now runs inside q/q after JOIN materialization instead of eagerly in server.clj. This enables scoring across join results, index-backed columns, and arbitrary expressions. Short form uses model's stored feature names; long form maps positional args to features with expression evaluation.

whilo added 4 commits April 13, 2026 17:45

whilo force-pushed the fix/iforest-benchmark-tests branch from f74d299 to 1964174 Compare April 14, 2026 00:46

whilo merged commit 8a26a1b into main Apr 14, 2026
5 of 6 checks passed

whilo deleted the fix/iforest-benchmark-tests branch April 14, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/iforest benchmark tests#10

Fix/iforest benchmark tests#10
whilo merged 4 commits intomainfrom
fix/iforest-benchmark-tests

whilo commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whilo commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant