Kaggle Competition - Predict NCAA Tournament outcomes using advanced ML ensemble methods, ELO ratings, Stacking, and Seed Override strategies.
This project tackles the March Machine Learning Mania 2026 Kaggle competition, which challenges participants to predict the probability of each possible matchup in the NCAA Men's and Women's Basketball Tournaments.
The pipeline combines feature engineering, custom ELO rating systems, LightGBM/XGBoost/CatBoost/LR ensemble stacking, Optuna hyperparameter optimization, Massey Ordinals, and Seed Override strategies to generate calibrated win probabilities for every potential game β evaluated using the Brier Score.
kaggle_mania/
βββ notebooks/ #Notebooks for exploratory analysis (EDA)
β
βββ output/
β βββ submission.csv #Final Kaggle submission file
β
βββ src/
βββ catboost_info/ #CatBoost training logs & metadata
βββ config.py #Global configuration (paths, hyperparameters, seeds)
βββ data_loading.py #Raw data ingestion and preprocessing
βββ dataset_builder.py #Feature matrix construction for train/test
βββ elo_rating.py #Custom ELO with MOV, custom prior and pre-tourney
βββ ensemble.py #Blending and Stacking ensemble logic
βββ feature_engineering.py #Advanced feature creation (stats, ELO PreTourney, seeds)
βββ model_training.py #Model training with Optuna hyperparameter tuning
βββ monte_carlo.py #Monte Carlo tournament bracket simulations
βββ submission.py #Formats submission CSV with Seed Override
βββ validation.py #Temporal CV splits and Brier Score evaluation
βββ main.py #Pipeline entrypoint β runs end-to-end
Raw NCAA historical data (Men's + Women's) is loaded, cleaned, and structured into season-level and game-level datasets. Massey Ordinals are filtered to the 15 most predictive ranking systems.
A custom ELO implementation with three key innovations:
- Margin of Victory (MOV), FiveThirtyEight-style multiplier: a win by 20 points increases rating more than a win by 2
- Custom Prior, teams don't start every season at the same rating; historically strong teams start higher based on their win percentage across all historical data
- ELO PreTourney, captures each team's ELO at their last regular season game, before the tournament begins, a cleaner signal than end-of-season ELO
- Home court adjustment,
+-100 ELO pointsfor home/away games - Soft reset between seasons, 75% carry-forward + 25% regression to custom prior
73 features engineered per matchup, including:
- Season averages (offensive/defensive efficiency, tempo, FG%, 3PT%, AST/TO)
- Tournament seeding differentials
- ELO delta, absolute ELO, and ELO PreTourney for each team
- Recent form, rolling window of last 14 games
- Historical tournament wins (cumulative, no leakage)
- Strength of Schedule
- Massey Ordinals, 15 systems: POM, SAG, MOR, COL, DOL, WLK, ARG, BPI, RPI, KPI, RTH, DCI, REW, AP, USA
Four base models trained with temporal cross-validation (9 folds, one season per fold):
| Model | Role |
|---|---|
| LightGBM | Primary boosting model |
| XGBoost | Diversity in boosting approach |
| CatBoost | Robust to noisy features |
| Logistic Regression | Strong linear baseline, consistently best single model |
All GBMs use early stopping (50 rounds) to prevent overfitting.
Bayesian optimization via Optuna runs 30 trials each for LightGBM and CatBoost, using the first temporal fold as the optimization target. Parameters tuned: learning_rate, max_depth, num_leaves, subsample, colsample_bytree, reg_alpha, reg_lambda.
Instead of a simple weighted average, a meta-model (Logistic Regression) is trained on the Out-of-Fold (OOF) predictions of the 4 base models. This learns the optimal combination weights automatically, consistently outperforming manual blending.
Meta-model learned weights (final submission):
lr: 3.71 <- dominant signal
lgb: 2.07
xgb: 0.78
cat: 0.29
A calibrator is fitted on the stacked OOF predictions. Applied only if it improves the Brier Score, avoiding unnecessary distortion.
Conservative probability clipping for extreme seed matchups, protecting against catastrophic Brier Score penalties from upsets:
| Matchup | Favorite probability range |
|---|---|
| Seed 1 vs Seed 16 | 0.82 β 0.93 |
| Seed 2 vs Seed 15 | 0.78 β 0.93 |
| Seed 1 vs Seed 15 | 0.78 β 0.93 |
Seeds loaded directly from MNCAATourneySeeds.csv - 56 matchups protected in the final submission.
Final probabilities formatted to Kaggle spec: one row per possible matchup (ID, Pred) for both Men's and Women's tournaments β 132,133 rows total. Predictions clipped to [0.05, 0.95].
| Metric | Description |
|---|---|
| Brier Score | Primary competition metric mean squared error of probabilities |
| AUC-ROC | Discrimination ability across all thresholds |
Validation uses temporal splits, training on earlier seasons and validating on later ones, to simulate real prediction scenarios and prevent data leakage.
| Version | OOF Brier Score | Key Change |
|---|---|---|
| V1 Baseline | ~0.220 | 5 Massey systems, no ELO prior |
| V2 Massey Expanded | ~0.215 | 15 Massey systems |
| V3 Optuna | ~0.210 | Optuna on LGB + CatBoost |
| V4 ELO PreTourney | ~0.110 | ELO PreTourney feature |
| V5 Stacking | ~0.090 | Meta-model stacking |
| V6 ELO Prior | ~0.088 | Custom ELO prior |
| V7 Seed Override | ~0.088 | Seed Override + updated data |
| V8 Final | 0.07667 | Brier Score optimization + calibration |
Python >= 3.10pip install -r requirements.txtpython src/main.pyThe pipeline will:
- Load and preprocess all data (Men's + Women's)
- Build custom ELO prior from historical win rates
- Compute ELO ratings and ELO PreTourney across all seasons
- Engineer 73 features per matchup
- Run Optuna optimization (30 trials each for LGB and CatBoost)
- Train 4 base models with temporal cross-validation (9 folds)
- Fit Stacking meta-model on OOF predictions
- Apply OOF calibration if it improves Brier Score
- Apply Seed Override for extreme matchups
- Output
output/submission.csv
All key parameters live in src/config.py:
SEED = 33
ELO_INITIAL = 1500
ELO_K = 20
ELO_HOME_ADVANTAGE = 100
ROLLING_WINDOW = 14
STAGE = 2 # 1 = development, 2 = final submission
GENDER = "M" # M = Men's, W = Women's| Rank | Feature | Description |
|---|---|---|
| 1 | Diff_ELO | ELO rating differential |
| 2 | Diff_ELO_PreTourney | ELO differential at tournament entry |
| 3 | Diff_Seed | Seed number differential |
| 4 | Diff_HistTourneyWins | Historical tournament wins differential |
| 5 | A_HistTourneyWins | Team A historical tournament wins |
| 6 | B_Seed | Team B seed number |
| 7 | Diff_Rolling_ScoreDiff | Recent scoring margin differential |
| 8 | Diff_Massey_DCI | Massey DCI system differential |
| 9 | A_Seed | Team A seed number |
| 10 | Diff_Stl_mean | Steals differential |
- ELO rating system with MOV (FiveThirtyEight style)
- Custom ELO prior based on historical win rates
- ELO PreTourney feature
- Massey Ordinals, 15 systems
- LGB + XGB + CatBoost + LR ensemble
- Optuna hyperparameter optimization
- Stacking with meta-model
- OOF calibration
- Seed Override for extreme matchups
- Brier Score optimization (competition metric)
- Men's + Women's combined submission
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Davis Denner
Data Scientist Β· Kaggle Enthusiast
GitHub Β· LinkedIn
This project is licensed under the MIT License.
In data we trust, in brackets we fight, March Madness 2026