🏠 Airbnb Data Pipeline (Python • Airflow • Docker • PostgreSQL)

A production-style ETL pipeline designed to automate the ingestion, transformation, and loading of Airbnb listings data into a PostgreSQL data warehouse.
The workflow is fully containerized with Docker and orchestrated using Apache Airflow for daily scheduling and monitoring.

⚙️ Project Overview

The Airbnb Data Pipeline automates the following key processes:

Extract – Ingests Airbnb listings from CSV files or a public API
Transform – Cleans, enriches, and validates the data using pandas
Load – Loads the processed data into a PostgreSQL warehouse using SQLAlchemy

This setup demonstrates how to design a modular, scalable, and reproducible ETL process suitable for real-world data engineering environments.

🚀 Technologies Used

Python – Data manipulation, validation, and scripting
Apache Airflow – Workflow orchestration and task scheduling
PostgreSQL – Centralized data storage and modeling
SQLAlchemy – ORM for efficient database interaction
Docker & Docker Compose – Containerization for portability and consistency

📊 Airflow DAG in Action

Below is a snapshot of the pipeline running in Apache Airflow, showing the three core ETL tasks executed in sequence:

Each DAG run performs:

Extract: Fetch raw Airbnb data
Transform: Apply cleaning and enrichment logic
Load: Store refined data in PostgreSQL

🎯 Purpose & Key Takeaways

This project highlights hands-on experience with modern Data Engineering practices, including:

Workflow automation using Airflow DAGs
Data extraction and cleaning with pandas
Data modeling and loading into PostgreSQL
Deployment via Docker Compose for easy reproducibility

It serves as a demonstration of building a scalable, maintainable ETL pipeline from scratch — a valuable foundation for production-grade data workflows.

🧠 Future Improvements

Add data quality validation with Great Expectations
Integrate automated logging and alerting
Extend to real-time ingestion using Kafka or Spark Streaming

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
airflow/logs		airflow/logs
dags		dags
data		data
scripts		scripts
sql		sql
.env		.env
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏠 Airbnb Data Pipeline (Python • Airflow • Docker • PostgreSQL)

⚙️ Project Overview

🚀 Technologies Used

📊 Airflow DAG in Action

🎯 Purpose & Key Takeaways

🧠 Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏠 Airbnb Data Pipeline (Python • Airflow • Docker • PostgreSQL)

⚙️ Project Overview

🚀 Technologies Used

📊 Airflow DAG in Action

🎯 Purpose & Key Takeaways

🧠 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages