Humphrey humphreyhhui

Hi, I'm Humphrey :)

Master's in Business Analytics @ UT Austin • Philosophy and Public Affairs @ Claremont McKenna

Grew up in Hong Kong and England

I love playing soccer, chess, and fantasy sports

Master's in Business Analytics

Data Projects

Project

Description

Austin House Price Prediction

Predicted Austin housing prices by engineering 14 features (age, sqft_ratio, bath_bed_weighted, lot_category, interaction terms) with log transformations on price and square footage to reduce skewness, treating zipcode and amenities as categorical factors, and handling unseen holdout zipcodes by replacing with the most common training zipcode.
Compared five models using 5-fold cross-validation on log-transformed prices: Bagging, Random Forest, XGBoost, BART, and Pruned Regression Tree. Selected Bagging as final model for its lowest RMSE and resistance to overfitting through ensemble averaging while maintaining low bias.
Tech Stack: R (tidyverse, tidymodels, rpart, randomForest, BART, xgboost)

Credit Card Fraud Prediction

Built fraud detection models on imbalanced Kaggle dataset (555K transactions, 99.6% non-fraud) by resampling to 78.6% non-fraud for training, engineering features including time-of-day indicators, distance calculations, target-encoded merchants/categories, and removing highly correlated predictors (lat/long) identified through correlation heatmap analysis.
Compared four classification models on 80/20 train-test split: Classification Tree achieved best performance (93% recall, 90% precision, 96% accuracy) using entropy criterion and max depth of 10, with transaction amount dominating feature importance (72%) followed by gas (6%) and groceries (2%) categories, outperforming Logistic Regression, Naïve Bayes, and KNN.
Tech Stack: pandas, matplotlib, scikit-learn

Finance Projects

Project	Description
Constructing Custom Equity Indexes	Built three custom stock indexes (equal-weighted, value-weighted, price-weighted) from top 100 stocks by market cap using CRSP data (2015-2024), incorporating delisting returns to avoid survivorship bias and implementing monthly reconstitution with strict t-1 information to prevent look-ahead bias. Compared custom indexes against SPY, IWM, and QQQ using correlation analysis and log returns, computed HHI scores for sector diversification analysis, and tested robustness across different market conditions by resetting indexes during 2020, 2022, and 2023 periods. Tech Stack: pandas, numpy, matplotlib, wrds, scipy, seaborn
Minimum Variance Portfolio Optimization	Optimized Global Minimum Variance portfolios comparing sample covariance vs. Ledoit-Wolf shrinkage methods across rolling backtesting periods (4-60 months), with interactive stock selection and user-defined weight constraints validated through coverage checks for data quality. Implemented quadratic optimization using CVXPY (OSQP solver) with enforced covariance matrix symmetry, demonstrating Ledoit-Wolf's superior out-of-sample performance through analysis of cumulative returns, portfolio variance, weight stability, and monthly turnover metrics. Tech Stack: pandas, numpy, matplotlib, wrds, scipy, seaborn, cvxpy
Testing Asset Pricing Models	Tested CAPM and Fama-French 3-Factor models on 25 size/book-to-market portfolios (1990-2024) using time-series regressions to estimate alphas and betas, GRS F-tests to jointly test alpha significance, and Fama-MacBeth cross-sectional regressions with Shanken corrections to measure factor risk premia. Found FF3F explained 94% of portfolio variance (vs. 70% for CAPM) with lower average alpha, though only market risk factor achieved statistical significance; high book-to-market portfolios showed 42% higher mean returns, with small-cap portfolios exhibiting greater variance and lower model fit. Tech Stack: pandas, numpy, matplotlib, statsmodels, scipy

Applied Data Science Program

Project	Description
Capstone Project	Applied various data segmentation techniques to classify customers and evaluated performance of techniques using silhouette scores and detailed cluster profiles. Designed and presented a professional presentation summarizing analytical processes, key findings, and practical business recommendations for stakeholders. Tech Stack: pandas, numpy, matplotlib, seaborn, scikit-learn
Bank Customer Segmentation	Utilized clustering algorithms such as K-Means, K-Medoids, and Gaussian Mixture Models on scaled datasets to segment customer populations. Evaluated clustering results by plotting SSE against cluster count to create and analyze elbow plots for determining optimal clustering solutions. Tech Stack: pandas, numpy, matplotlib, seaborn, scikit-learn
Car Dealership Dimension Reduction	Preprocessed raw data and applied dimensionality reduction techniques, including t-SNE and PCA, to effectively segment dealership customers. Conducted comprehensive variable analysis for each segment to uncover key behavioral and demographic characteristics. Tech Stack: pandas, numpy, matplotlib, seaborn, scikit-learn
Foodhub EDA	Created univariate and bivariate visualizations using Matplotlib and Seaborn to identify trends and patterns in customer behavior. Conducted in-depth analysis of data insights and presented actionable recommendations to optimize business operations and customer satisfaction. Tech Stack: pandas, numpy, matplotlib, seaborn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Humphrey humphreyhhui

Block or report humphreyhhui

Hi, I'm Humphrey :)

Master's in Business Analytics

Data Projects

Finance Projects

Applied Data Science Program

Pinned Loading

Uh oh!