Skip to content

Fix iforest benchmark/test performance and ODDS data download#9

Merged
whilo merged 3 commits intomainfrom
fix/iforest-benchmark-tests
Apr 13, 2026
Merged

Fix iforest benchmark/test performance and ODDS data download#9
whilo merged 3 commits intomainfrom
fix/iforest-benchmark-tests

Conversation

@whilo
Copy link
Copy Markdown
Member

@whilo whilo commented Apr 13, 2026

  • Rewrite AUC-ROC to use primitive arrays instead of boxed Clojure vectors (Http 567K rows: 8+ minutes → milliseconds)
  • Switch ODDS download URLs from broken Stony Brook site to Dropbox mirrors
  • Add h5py support for MATLAB v7.3 files (http.mat, forestcover.mat)
  • Auto-download ODDS data when missing via ensure-odds-data
  • Fix reflection warnings (type hints on features/labels arrays)
  • Fix mammography AUC threshold (0.80 → 0.70 to match actual performance)
  • Fix server.clj formatting

whilo added 3 commits April 13, 2026 14:32
- Rewrite AUC-ROC to use primitive arrays instead of boxed Clojure vectors
  (Http 567K rows: 8+ minutes → milliseconds)
- Switch ODDS download URLs from broken Stony Brook site to Dropbox mirrors
- Add h5py support for MATLAB v7.3 files (http.mat, forestcover.mat)
- Auto-download ODDS data when missing via ensure-odds-data
- Fix reflection warnings (type hints on features/labels arrays)
- Fix mammography AUC threshold (0.80 → 0.70 to match actual performance)
- Fix server.clj formatting
…ard dataset

- PyOD benchmark now prints both Stratum and PyOD results per dataset
- Fix PyOD score timing: use decision_function(X) instead of decision_scores_ attribute
- Add CreditCard fraud dataset (284K rows, 30 features) from OpenML mirror
- CreditCard auto-downloads alongside ODDS datasets via bin/download-odds
Our tree builder stopped splitting immediately when the randomly chosen
feature had no variance, while sklearn tries other features first. This
caused 7% premature leaf exits on mammography (6 features), trapping 33%
of data points in oversized leaves and compressing path length range.

Mammography AUC: 0.73 → 0.86 (matches sklearn's 0.87)
ForestCover AUC: 0.80 → 0.87
Sample-size stable: AUC 0.86 from size 64 through 2048 (was 0.49 at 2048)
Zero impact on scoring performance (only training affected).
@whilo whilo merged commit 0664a6e into main Apr 13, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant