Skip to content

Fix/iforest benchmark tests#10

Merged
whilo merged 4 commits intomainfrom
fix/iforest-benchmark-tests
Apr 14, 2026
Merged

Fix/iforest benchmark tests#10
whilo merged 4 commits intomainfrom
fix/iforest-benchmark-tests

Conversation

@whilo
Copy link
Copy Markdown
Member

@whilo whilo commented Apr 14, 2026

No description provided.

whilo added 4 commits April 13, 2026 17:45
- Rewrite AUC-ROC to use primitive arrays instead of boxed Clojure vectors
  (Http 567K rows: 8+ minutes → milliseconds)
- Switch ODDS download URLs from broken Stony Brook site to Dropbox mirrors
- Add h5py support for MATLAB v7.3 files (http.mat, forestcover.mat)
- Auto-download ODDS data when missing via ensure-odds-data
- Fix reflection warnings (type hints on features/labels arrays)
- Fix mammography AUC threshold (0.80 → 0.70 to match actual performance)
- Fix server.clj formatting
Our tree builder stopped splitting immediately when the randomly chosen
feature had no variance, while sklearn tries other features first. This
caused 7% premature leaf exits on mammography (6 features), trapping 33%
of data points in oversized leaves and compressing path length range.

Mammography AUC: 0.73 → 0.86 (matches sklearn's 0.87)
ForestCover AUC: 0.80 → 0.87
Sample-size stable: AUC 0.86 from size 64 through 2048 (was 0.49 at 2048)
Zero impact on scoring performance (only training affected).
…RIBE MODEL

Enables end-to-end anomaly detection from any PostgreSQL client without
Clojure. Follows BigQuery ML CREATE MODEL syntax.

- sql.clj: regex-based parsing for model DDL statements, parse-model-options
  helper for OPTIONS clause
- server.clj: :create-model/:drop-model DDL handlers, extensible model-type-map,
  fix encoded column unwrapping in resolve-anomaly-expressions
- sql_test.clj: parse-level tests for all 4 statements, full end-to-end test
  (CREATE TABLE → INSERT → CREATE MODEL → SHOW → DESCRIBE → ANOMALY_SCORE → DROP)
- Updated docs: anomaly-detection.md, sql-interface.md, README.md
Anomaly scoring now runs inside q/q after JOIN materialization instead of
eagerly in server.clj. This enables scoring across join results, index-backed
columns, and arbitrary expressions. Short form uses model's stored feature
names; long form maps positional args to features with expression evaluation.
@whilo whilo force-pushed the fix/iforest-benchmark-tests branch from f74d299 to 1964174 Compare April 14, 2026 00:46
@whilo whilo merged commit 8a26a1b into main Apr 14, 2026
5 of 6 checks passed
@whilo whilo deleted the fix/iforest-benchmark-tests branch April 14, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant