A machine learning project to identify customers likely to churn, generate churn risk scores, and provide actionable business insights for proactive retention strategies.
- Identify customers at risk of churn
- Generate churn probability scores
- Segment customers into risk groups
- Understand key drivers of churn
- Provide business recommendations for retention
The dataset is loaded from a telecom SQL database and contains customer usage patterns, plan details, service interactions, and churn labels.
Key Features
- Call minutes & charges (Day, Evening, Night, International)
- Customer service calls
- Voicemail & international plans
- Account length & usage behavior
- Target variable: CHURN_FLAG
- Python
- Pandas, NumPy
- Scikit-learn
- Seaborn, Matplotlib
- PyMySQL
- Jupyter Notebook
- Loaded telecom customer data from SQL database
- Standardized churn labels
- Converted object columns to numeric
- Handled missing values with median imputation
- Created binary churn flag
- One-hot encoded categorical variables
- Scaled numerical features
Two models were trained:
- Logistic Regression (baseline)
- Random Forest (final model)
Metrics used:
- Accuracy
- Precision
- Recall
- F1 Score
- ROC-AUC
- Accuracy: 86%
- Recall (churn): 23%
- ROC-AUC: 0.79
- Accuracy: 94%
- Recall (churn): 60%
- ROC-AUC: 0.89
👉 Random Forest selected as best model.
Top predictors identified:
- Customer Service Calls
- Day Minutes & Charges
- International Plan
- Evening Minutes & Charges
- International Usage
Customers classified into:
- High Risk → churn score ≥ 0.70
- Medium Risk → churn score 0.40–0.69
- Low Risk → churn score < 0.40
- Improve customer support for high call-frequency customers
- Optimize pricing for heavy usage customers
- Reevaluate international plan offerings
- Implement proactive retention campaigns using churn scores
- Use segmentation for targeted marketing
# Clone repository
git clone https://github.com/SyedHussain23/telecom-customer-churn-prediction.git
# Navigate to project folder
cd telecom-customer-churn-prediction
# Install dependencies
pip install -r requirements.txt
# Run notebook
jupyter notebook telecom-customer-churn-prediction.ipynbUpdate SQL credentials before running:
HOST = "your_host"
USER = "your_user"
PASSWORD = "your_password"
DB_NAME = "your_database"👉 For security, store credentials using .env file.
The project generates:
- Churn predictions
- Churn probability scores
- Risk segmentation
- Business-ready dataset
telecom_churn_predictions.csv
- Hyperparameter tuning
- Advanced ensemble models (XGBoost, LightGBM)
- Deep learning churn modeling
- Real-time churn prediction pipeline
- Deployment via API or dashboard
Syed Hussain Abdul Hakeem
🔗 GitHub: https://github.com/SyedHussain23
🔗 LinkedIn: https://www.linkedin.com/in/syed-hussain-abdul-hakeem
If you found this project useful, please consider giving it a ⭐ on GitHub.