Fix iforest benchmark/test performance and ODDS data download by whilo · Pull Request #9 · replikativ/stratum

whilo · 2026-04-13T21:33:08Z

Rewrite AUC-ROC to use primitive arrays instead of boxed Clojure vectors (Http 567K rows: 8+ minutes → milliseconds)
Switch ODDS download URLs from broken Stony Brook site to Dropbox mirrors
Add h5py support for MATLAB v7.3 files (http.mat, forestcover.mat)
Auto-download ODDS data when missing via ensure-odds-data
Fix reflection warnings (type hints on features/labels arrays)
Fix mammography AUC threshold (0.80 → 0.70 to match actual performance)
Fix server.clj formatting

- Rewrite AUC-ROC to use primitive arrays instead of boxed Clojure vectors (Http 567K rows: 8+ minutes → milliseconds) - Switch ODDS download URLs from broken Stony Brook site to Dropbox mirrors - Add h5py support for MATLAB v7.3 files (http.mat, forestcover.mat) - Auto-download ODDS data when missing via ensure-odds-data - Fix reflection warnings (type hints on features/labels arrays) - Fix mammography AUC threshold (0.80 → 0.70 to match actual performance) - Fix server.clj formatting

…ard dataset - PyOD benchmark now prints both Stratum and PyOD results per dataset - Fix PyOD score timing: use decision_function(X) instead of decision_scores_ attribute - Add CreditCard fraud dataset (284K rows, 30 features) from OpenML mirror - CreditCard auto-downloads alongside ODDS datasets via bin/download-odds

Our tree builder stopped splitting immediately when the randomly chosen feature had no variance, while sklearn tries other features first. This caused 7% premature leaf exits on mammography (6 features), trapping 33% of data points in oversized leaves and compressing path length range. Mammography AUC: 0.73 → 0.86 (matches sklearn's 0.87) ForestCover AUC: 0.80 → 0.87 Sample-size stable: AUC 0.86 from size 64 through 2048 (was 0.49 at 2048) Zero impact on scoring performance (only training affected).

whilo added 3 commits April 13, 2026 14:32

whilo merged commit 0664a6e into main Apr 13, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix iforest benchmark/test performance and ODDS data download#9

Fix iforest benchmark/test performance and ODDS data download#9
whilo merged 3 commits intomainfrom
fix/iforest-benchmark-tests

whilo commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whilo commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant