This project demonstrates a fully serverless data analytics pipeline on AWS. It ingests raw sales data stored in Amazon S3, catalogs the data using AWS Glue, queries it with Amazon Athena, and visualizes the insights using Amazon QuickSight.
- ✅ 100% Infrastructure as Code using Terraform
- ✅ Uses IAM best practices to manage permissions
- ✅ Visual dashboards for quick business insights
- ✅ Easy to extend with larger datasets and more complex ETL
- Amazon S3: Raw sales CSV data storage
- AWS Glue: Data crawler that scans S3 and creates metadata in the Glue Data Catalog
- Glue Data Catalog: Stores schema information for Athena queries
- Amazon Athena: Executes SQL queries on cataloged data
- Amazon QuickSight: Connects to Athena and builds interactive dashboards
- IAM Roles: Securely manages service permissions with least privilege access
- Infrastructure as Code (IaC): Easily reproducible and version-controlled with Terraform
- Scalable: Handles datasets from hundreds of rows to millions of records
- Serverless: No servers to manage, maintain, or patch
- Secure: Implements IAM roles with least privilege access principles
- Real-time Visualization: Interactive business dashboards with Amazon QuickSight
- Cost-Effective: Pay only for what you use with serverless architecture
- AWS CLI configured with appropriate permissions
- Terraform installed (version 1.0+)
- Git for repository management
git clone https://github.com/your-username/terraform-data-pipeline.git
cd terraform-data-pipelineEdit variables.tf to modify:
- Bucket names
- AWS region
- Project tags
- Resource naming conventions
terraform initterraform plan
terraform applyUpload sales_data.csv (or your own dataset) to the created S3 bucket:
aws s3 cp sample-data/sales_data.csv s3://your-bucket-name/- Navigate to AWS Glue Console
- Start the created crawler to populate the Data Catalog
- Verify table creation in the Data Catalog
- Open Amazon Athena Console
- Run SQL queries on your cataloged data
- Example query:
SELECT product_category, SUM(sales_amount) as total_sales
FROM your_table_name
GROUP BY product_category
ORDER BY total_sales DESC;- Connect QuickSight to your Athena data source
- Create visualizations using the drag-and-drop interface
- Publish dashboards for business stakeholders
This project implements several security best practices:
- ✅ No Hardcoded Credentials: All access is managed through IAM roles
- ✅ State File Security:
terraform.tfstateis excluded via.gitignore - ✅ Least Privilege Access: IAM policies grant minimal required permissions
- ✅ Resource Isolation: Dedicated IAM roles for each service component
- ✅ Encryption: S3 buckets and data transfers are encrypted
- Update S3 bucket structure in
variables.tf - Modify Glue crawler configuration for new data formats
- Adjust Athena queries for additional tables
- Enable CloudTrail for audit logging
- Implement data partitioning strategies
- Add automated data quality checks
- Set up monitoring and alerting with CloudWatch
- Add Glue ETL jobs for data transformation
- Implement data validation and cleansing
- Schedule automated data processing workflows
- S3: Use Intelligent Tiering for automatic cost optimization
- Athena: Optimize queries and use columnar formats (Parquet)
- Glue: Schedule crawlers efficiently to avoid unnecessary runs
- QuickSight: Choose appropriate licensing model based on user count
Glue Crawler Fails
- Check IAM permissions for S3 access
- Verify S3 bucket and path configuration
- Ensure data format is supported
Athena Query Errors
- Confirm Data Catalog table exists
- Check query syntax and table names
- Verify result location S3 bucket permissions
QuickSight Connection Issues
- Ensure QuickSight has permissions to access Athena
- Check VPC configuration if using private subnets
- Verify data source configuration
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License. See LICENSE.txt for details.
Built by Hasan Adnan 🚀
- 📧 Email: [hassanmoaid44@gmail.com]
- 💼 LinkedIn: Let's connect on LinkedIn!
- 🐙 GitHub: hasan4adnan
- AWS Documentation and Best Practices
- Terraform AWS Provider Documentation
- Community feedback and contributions
Last updated: July 2025






