Skip to content

faizanhuda12/MlOps-Pipeline-DEMO

Repository files navigation

Flight Delay Prediction: Production-Ready MLOps Platform

A complete end-to-end machine learning operations platform that automates the entire ML lifecycle from training to production deployment, with automated drift detection and self-healing capabilities.

MLOps AWS CI/CD Python License


Business Problem

Airlines face significant operational challenges when flights are delayed, leading to:

  • Operational Costs: Cascading delays, gate reassignments, crew scheduling conflicts, and aircraft routing disruptions cost airlines millions annually
  • Customer Experience: Passengers experience frustration, missed connections, and lost productivity, directly impacting customer satisfaction and retention
  • Resource Optimization: Inability to proactively allocate resources (ground crew, gates, maintenance) leads to inefficient operations
  • Revenue Impact: Delayed flights result in compensation claims, refunds, and lost future bookings

The Challenge: Traditional reactive approaches to flight delays are insufficient. Airlines need a predictive system that can identify high-risk flights before they depart, enabling proactive interventions and resource optimization.


Solution Overview

This project delivers a production-grade MLOps platform that predicts flight delays with 91.56% ROC-AUC accuracy, enabling airlines to:

Predict delays 15+ minutes in advance with high confidence
Automate the entire ML lifecycle from training to deployment
Monitor model performance in real-time with automated drift detection
Self-heal by automatically retraining when data drift is detected
Scale seamlessly on cloud infrastructure with zero-downtime deployments

Key Differentiators

  • End-to-End Automation: Fully automated CI/CD pipeline eliminates manual intervention
  • Production-Ready: Enterprise-grade error handling, logging, monitoring, and observability
  • Self-Monitoring: Automated drift detection with intelligent retraining triggers
  • Scalable Architecture: Cloud-native design using AWS ECS Fargate, containerized with Docker
  • Infrastructure as Code: Complete Terraform automation for reproducible deployments
  • ML Engineering Best Practices: Feature engineering pipelines, model versioning, baseline statistics, threshold optimization

Architecture

System Overview

┌─────────────────────────────────────────────────────────────────┐
│                    GITHUB REPOSITORY                             │
│  Source Code → GitHub Actions CI/CD → AWS Cloud Infrastructure │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              AUTOMATED ML PIPELINE (GitHub Actions)             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Train Model │→ │ Build Image  │→ │ Deploy ECS   │         │
│  │ (Quality    │  │ (Docker/ECR) │  │ (Blue-Green) │         │
│  │  Gates)     │  │              │  │              │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              AWS PRODUCTION INFRASTRUCTURE                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ ECS Fargate  │  │ CloudWatch   │  │ S3 Artifacts  │         │
│  │ (FastAPI)    │  │ (Logs/Metrics)│  │ (Models/Stats)│         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              AUTOMATED MONITORING & DRIFT DETECTION             │
│  EventBridge → Daily Analysis → CloudWatch Alarms → SNS Alerts │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ IF High Drift: Auto-Retrain → Deploy → Notify Team     │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Key Features

1. Automated ML Pipeline

  • Quality Gates: Model must achieve ROC-AUC ≥ 0.85 to deploy
  • Reproducible Training: Fixed random seeds, versioned artifacts, comprehensive logging
  • Feature Engineering: 35 engineered features from 15 raw inputs using advanced techniques:
    • Cyclical temporal encoding (sine/cosine transformations)
    • Weather interaction features
    • Route and airport aggregation statistics
    • Distance-based and time-based binary features
  • Class Imbalance Handling: XGBoost with scale_pos_weight and threshold optimization

2. Production-Grade API

  • FastAPI Framework: High-performance async API with automatic OpenAPI documentation
  • Input Validation: Pydantic schemas with comprehensive field validation
  • Batch Processing: Single and batch prediction endpoints
  • Health Monitoring: /health endpoint for ECS health checks
  • Prediction Logging: All predictions logged for drift detection and audit trails

3. CI/CD Automation

  • GitHub Actions Workflows: Three-stage pipeline (Train → Build → Deploy)
  • Artifact Management: GitHub Artifacts and S3 for model versioning
  • Docker Containerization: Multi-stage builds, optimized image sizes
  • Blue-Green Deployments: Zero-downtime deployments on AWS ECS Fargate
  • Infrastructure as Code: Complete Terraform automation

4. Model Monitoring & Drift Detection

  • Automated Daily Analysis: EventBridge-triggered drift detection at 2 AM UTC
  • Statistical Tests: Kolmogorov-Smirnov, Chi-square, PSI (Population Stability Index)
  • Multi-Metric Monitoring: Mean shifts, variance changes, quantile shifts, z-score analysis
  • Automated Remediation: High drift triggers automatic retraining workflow
  • Alerting: CloudWatch Alarms → SNS → Email notifications
  • Issue Tracking: Automatic GitHub issue creation with labels

5. Observability & Monitoring

  • CloudWatch Logs: Centralized logging with 7-14 day retention
  • CloudWatch Metrics: Custom metrics for drift severity, feature drift counts
  • CloudWatch Alarms: Threshold-based alerts for high drift scenarios
  • Health Checks: Container-level and ECS service-level health monitoring
  • Prediction Tracking: Complete audit trail of all predictions

Model Performance

Metrics

Metric Training Test (Default Threshold) Test (Optimal Threshold)
ROC-AUC 1.0000 0.9156 0.9156
Accuracy 99.85% 89.60% 86.70%
Precision 0.9995 0.6250 0.4957
Recall 0.9995 0.5631 0.6900
F1-Score 0.9995 0.5926 0.5273

Model Details

  • Algorithm: XGBoost (Gradient Boosting)
  • Features: 35 engineered features from 15 raw inputs
  • Optimal Threshold: 0.3711 (optimized for F1-score on imbalanced data)
  • Class Distribution: 89.72% not delayed, 10.28% delayed (8.73:1 ratio)
  • Training Samples: 4,000 (80% split)
  • Test Samples: 1,000 (20% split)

Feature Importance

Top predictive features:

  1. arr_delay_minutes (correlation: 0.76)
  2. dep_weather_wind_kts (correlation: 0.33)
  3. airport_congestion_score (correlation: 0.16)
  4. Weather interaction features
  5. Temporal cyclical features

Technology Stack

Machine Learning

  • XGBoost: Gradient boosting for binary classification
  • scikit-learn: Model evaluation, metrics, train/test splitting
  • pandas: Data manipulation and feature engineering
  • numpy: Numerical computations

Backend & API

  • FastAPI: High-performance async web framework
  • Pydantic: Data validation and serialization
  • uvicorn: ASGI server for FastAPI

Infrastructure & DevOps

  • AWS ECS Fargate: Serverless container orchestration
  • AWS ECR: Container registry with image scanning
  • AWS S3: Model artifacts and drift reports storage
  • AWS CloudWatch: Logging, metrics, and alarms
  • AWS SNS: Notification service for alerts
  • AWS EventBridge: Scheduled drift detection triggers
  • Terraform: Infrastructure as Code
  • Docker: Containerization
  • GitHub Actions: CI/CD automation

Monitoring & Observability

  • CloudWatch Logs: Centralized logging
  • CloudWatch Metrics: Custom ML metrics
  • CloudWatch Alarms: Automated alerting
  • SNS: Email notifications

Project Structure

.
├── MlOps/                          # ML Pipeline Components
│   ├── feature_engineering.py      # Production feature engineering (35 features)
│   ├── train_model.py             # Automated training script with quality gates
│   ├── model_drift_job.py         # Statistical drift detection
│   ├── requirements.txt           # Python dependencies
│   └── README.md                  # ML pipeline documentation
│
├── services/                       # Deployment Services
│   └── inference_api/             # FastAPI Production API
│       ├── main.py                # API application with validation
│       ├── Dockerfile             # Optimized container definition
│       ├── requirements.txt       # API dependencies
│       └── PREDICTION_GUIDE.md    # API usage documentation
│
├── terraform/                      # Infrastructure as Code
│   ├── main.tf                    # Complete AWS infrastructure
│   ├── variables.tf               # Configurable variables
│   ├── outputs.tf                 # Terraform outputs
│   └── README.md                  # Infrastructure setup guide
│
├── .github/                        # CI/CD Automation
│   ├── workflows/
│   │   ├── ml-pipeline.yml        # Main training/deployment pipeline
│   │   └── drift-detection.yml    # Scheduled drift detection
│   └── aws/
│       └── task-definition.json   # ECS task definition template
│
├── ARCHITECTURE_DIAGRAM.md         # Mermaid architecture diagrams
├── CLOUD_ARCHITECTURE_DIAGRAM.md   # Cloud architecture visualization
├── CI_CD_PIPELINE_GUIDE.md        # Detailed CI/CD documentation
├── synthetic_flight_data.csv       # Training dataset (5,000 samples)
└── README.md                       # This file

📡 API Design

RESTful Endpoints

The production API exposes the following endpoints:

  • GET /health: Health check endpoint for monitoring and load balancers
  • POST /predict: Single flight delay prediction with comprehensive input validation
  • POST /predict/batch: Batch prediction for processing multiple flights efficiently
  • GET /predictions/log: Retrieve prediction logs for drift analysis and auditing

API Features

  • Input Validation: Pydantic schemas ensure type safety and data integrity
  • Automatic Documentation: OpenAPI/Swagger UI and ReDoc generated automatically
  • Error Handling: Comprehensive HTTP error codes with descriptive messages
  • Prediction Logging: All predictions logged with timestamps for drift detection
  • Model Versioning: API responses include model version for traceability

Example Response

{
  "prediction": 0,
  "probability": 0.15,
  "probability_not_delayed": 0.85,
  "threshold": 0.3711,
  "model_version": "1.0.0",
  "timestamp": "2024-08-15T10:30:00"
}

CI/CD Pipeline

Pipeline Stages

  1. Train Model (train-model job)

    • Checkout code
    • Setup Python 3.9 with pip caching
    • Install dependencies
    • Train XGBoost model with quality gates
    • Upload artifacts (model.pkl, feature_engineer.pkl, baseline_stats.json)
    • Upload baseline to S3 for drift detection
  2. Build & Push Image (build-and-push-image job)

    • Download model artifacts
    • Authenticate with ECR
    • Build Docker image (multi-tag: SHA + latest)
    • Push to ECR
    • Save image URI for deployment
  3. Deploy to ECS (deploy-to-ecs job)

    • Download image URI
    • Update ECS task definition
    • Deploy with blue-green strategy
    • Wait for service stability
    • Output public IP address

Trigger Conditions

  • Automatic: Push to main branch when:
    • synthetic_flight_data.csv changes
    • MlOps/** files change
    • services/inference_api/** files change
  • Manual: workflow_dispatch for on-demand execution

Model Monitoring & Drift Detection

Automated Drift Detection

Schedule: Daily at 2 AM UTC (EventBridge trigger)

Process:

  1. Fetch prediction logs from production API
  2. Download baseline statistics from S3
  3. Run statistical drift analysis:
    • Kolmogorov-Smirnov test
    • Chi-square test
    • Population Stability Index (PSI)
    • Mean, variance, quantile comparisons
  4. Calculate drift severity (0-3 scale)
  5. Generate comprehensive drift report

Drift Response

If High Drift Detected (Severity ≥ 2.5):

  • Publish metrics to CloudWatch
  • Trigger CloudWatch Alarms
  • Send SNS email notifications
  • Automatically trigger retraining workflow
  • Create GitHub issue with labels

If Low/No Drift:

  • Upload drift report to S3
  • Continue monitoring

Monitoring Metrics

  • DriftSeverity: Overall drift severity (0-3)
  • FeaturesWithDrift: Count of features showing drift
  • PSI_Score: Population Stability Index

Key Learnings & Best Practices

MLOps Engineering

Reproducibility: Fixed random seeds, versioned artifacts, comprehensive logging
Quality Gates: Automated model validation before deployment
Feature Engineering: Production-ready fit/transform pattern with unseen category handling
Class Imbalance: XGBoost scale_pos_weight + threshold optimization
Model Versioning: Artifact versioning in S3 and ECR
Baseline Statistics: Training data statistics for drift detection

DevOps & Infrastructure

Infrastructure as Code: Complete Terraform automation
Containerization: Docker with optimized multi-stage builds
CI/CD Automation: GitHub Actions with artifact management
Blue-Green Deployments: Zero-downtime deployments
Health Checks: Container and service-level monitoring
Secrets Management: GitHub Secrets with least-privilege IAM

Observability

Centralized Logging: CloudWatch Logs with retention policies
Custom Metrics: ML-specific metrics (drift severity, feature drift)
Automated Alerting: CloudWatch Alarms → SNS → Email
Prediction Tracking: Complete audit trail for compliance
Drift Detection: Automated statistical analysis with remediation



Testing & Validation

Model Validation

  • Quality Gates: Automated validation ensures ROC-AUC ≥ 0.85 before deployment
  • Stratified Splitting: Maintains class distribution in train/test sets
  • Comprehensive Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC tracked
  • Threshold Optimization: Precision-recall curve analysis for optimal threshold selection

API Testing

  • Health Checks: Container and service-level health monitoring
  • Input Validation: Pydantic schemas validate all inputs before processing
  • Error Handling: Comprehensive error responses with appropriate HTTP status codes
  • Load Testing: API designed to handle concurrent requests with async FastAPI

Integration Testing

  • CI/CD Pipeline: Automated testing in GitHub Actions workflows
  • End-to-End Validation: Complete pipeline tested from training to deployment
  • Drift Detection: Automated statistical tests validate model performance

Technical Achievements

MLOps Engineering Excellence

Complete Automation: End-to-end CI/CD pipeline from code commit to production deployment
Quality Assurance: Automated quality gates ensure only high-performing models deploy
Self-Healing System: Automated drift detection triggers retraining when model performance degrades
Production-Grade Code: Comprehensive error handling, logging, monitoring, and observability
Scalable Design: Cloud-native architecture with containerization and infrastructure as code
ML Best Practices: Advanced feature engineering, class imbalance handling, threshold optimization

Infrastructure & DevOps

Infrastructure as Code: Complete Terraform automation for reproducible deployments
Container Orchestration: AWS ECS Fargate with blue-green deployment strategy
Secrets Management: Secure credential handling with GitHub Secrets and IAM roles
Monitoring Stack: CloudWatch Logs, Metrics, Alarms, and SNS notifications
Artifact Management: Versioned model artifacts in S3 and container images in ECR

Machine Learning

Feature Engineering: 35 engineered features from 15 raw inputs using advanced techniques
Model Performance: 91.56% ROC-AUC with optimized threshold for imbalanced data
Reproducibility: Fixed random seeds, versioned artifacts, comprehensive logging
Drift Detection: Statistical analysis with Kolmogorov-Smirnov, Chi-square, and PSI tests
Baseline Statistics: Training data statistics captured for production monitoring


License

This project is for educational and demonstration purposes, showcasing production-ready MLOps engineering practices.


Author

Faizan Huda

ML Engineer | MLOps Specialist | Cloud Architect

Built to demonstrate enterprise-grade MLOps engineering practices and production-ready machine learning systems.

About

Repo demonstrates a full end to end ML worklflow from model training to MlOps to drift detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors