Skip to content
View humphreyhhui's full-sized avatar

Block or report humphreyhhui

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
humphreyhhui/README.md

Hi, I'm Humphrey :)

Master's in Business Analytics @ UT Austin • Philosophy and Public Affairs @ Claremont McKenna

Grew up in Hong Kong and England

I love playing soccer, chess, and fantasy sports


UT Austin / McCombs Logo

Master's in Business Analytics

Data Projects

Project Description
Austin House Price Prediction
  • Predicted Austin housing prices by engineering 14 features (age, sqft_ratio, bath_bed_weighted, lot_category, interaction terms) with log transformations on price and square footage to reduce skewness, treating zipcode and amenities as categorical factors, and handling unseen holdout zipcodes by replacing with the most common training zipcode.
  • Compared five models using 5-fold cross-validation on log-transformed prices: Bagging, Random Forest, XGBoost, BART, and Pruned Regression Tree. Selected Bagging as final model for its lowest RMSE and resistance to overfitting through ensemble averaging while maintaining low bias.
  • Tech Stack: R (tidyverse, tidymodels, rpart, randomForest, BART, xgboost)
Credit Card Fraud Prediction
  • Built fraud detection models on imbalanced Kaggle dataset (555K transactions, 99.6% non-fraud) by resampling to 78.6% non-fraud for training, engineering features including time-of-day indicators, distance calculations, target-encoded merchants/categories, and removing highly correlated predictors (lat/long) identified through correlation heatmap analysis.
  • Compared four classification models on 80/20 train-test split: Classification Tree achieved best performance (93% recall, 90% precision, 96% accuracy) using entropy criterion and max depth of 10, with transaction amount dominating feature importance (72%) followed by gas (6%) and groceries (2%) categories, outperforming Logistic Regression, Naïve Bayes, and KNN.
  • Tech Stack: pandas, matplotlib, scikit-learn

Finance Projects

Project Description
Constructing Custom Equity Indexes
  • Built three custom stock indexes (equal-weighted, value-weighted, price-weighted) from top 100 stocks by market cap using CRSP data (2015-2024), incorporating delisting returns to avoid survivorship bias and implementing monthly reconstitution with strict t-1 information to prevent look-ahead bias.
  • Compared custom indexes against SPY, IWM, and QQQ using correlation analysis and log returns, computed HHI scores for sector diversification analysis, and tested robustness across different market conditions by resetting indexes during 2020, 2022, and 2023 periods.
  • Tech Stack: pandas, numpy, matplotlib, wrds, scipy, seaborn
Minimum Variance Portfolio Optimization
  • Optimized Global Minimum Variance portfolios comparing sample covariance vs. Ledoit-Wolf shrinkage methods across rolling backtesting periods (4-60 months), with interactive stock selection and user-defined weight constraints validated through coverage checks for data quality.
  • Implemented quadratic optimization using CVXPY (OSQP solver) with enforced covariance matrix symmetry, demonstrating Ledoit-Wolf's superior out-of-sample performance through analysis of cumulative returns, portfolio variance, weight stability, and monthly turnover metrics.
  • Tech Stack: pandas, numpy, matplotlib, wrds, scipy, seaborn, cvxpy
Testing Asset Pricing Models
  • Tested CAPM and Fama-French 3-Factor models on 25 size/book-to-market portfolios (1990-2024) using time-series regressions to estimate alphas and betas, GRS F-tests to jointly test alpha significance, and Fama-MacBeth cross-sectional regressions with Shanken corrections to measure factor risk premia.
  • Found FF3F explained 94% of portfolio variance (vs. 70% for CAPM) with lower average alpha, though only market risk factor achieved statistical significance; high book-to-market portfolios showed 42% higher mean returns, with small-cap portfolios exhibiting greater variance and lower model fit.
  • Tech Stack: pandas, numpy, matplotlib, statsmodels, scipy

MIT Logo

Applied Data Science Program

Project Description
Capstone Project
  • Applied various data segmentation techniques to classify customers and evaluated performance of techniques using silhouette scores and detailed cluster profiles.
  • Designed and presented a professional presentation summarizing analytical processes, key findings, and practical business recommendations for stakeholders.
  • Tech Stack: pandas, numpy, matplotlib, seaborn, scikit-learn
Bank Customer Segmentation
  • Utilized clustering algorithms such as K-Means, K-Medoids, and Gaussian Mixture Models on scaled datasets to segment customer populations.
  • Evaluated clustering results by plotting SSE against cluster count to create and analyze elbow plots for determining optimal clustering solutions.
  • Tech Stack: pandas, numpy, matplotlib, seaborn, scikit-learn
Car Dealership Dimension Reduction
  • Preprocessed raw data and applied dimensionality reduction techniques, including t-SNE and PCA, to effectively segment dealership customers.
  • Conducted comprehensive variable analysis for each segment to uncover key behavioral and demographic characteristics.
  • Tech Stack: pandas, numpy, matplotlib, seaborn, scikit-learn
Foodhub EDA
  • Created univariate and bivariate visualizations using Matplotlib and Seaborn to identify trends and patterns in customer behavior.
  • Conducted in-depth analysis of data insights and presented actionable recommendations to optimize business operations and customer satisfaction.
  • Tech Stack: pandas, numpy, matplotlib, seaborn

Pinned Loading

  1. UTMcCombsMSBA UTMcCombsMSBA Public

    Jupyter Notebook 1

  2. MIT-Applied-Data-Science-Program MIT-Applied-Data-Science-Program Public

    Jupyter Notebook

  3. FreeCodeCampSQLCertificate FreeCodeCampSQLCertificate Public

    Shell