Skip to content

scaleborg/urban-mobility-control-tower

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Urban Mobility Control Tower

End-to-end real-time data pipeline for bike-share availability metrics. Ingests Citi Bike GBFS feeds, captures changes via CDC, computes streaming aggregates, and serves analytical data through a low-latency API.

Architecture

Architecture

Component Role
ingest/ Polls Citi Bike GBFS feed into Postgres every 60s
connectors/ Debezium CDC connector config
flink/ Flink SQL: tumbling 1-min availability windows
analytics/consumer/ KafkaDuckDB landing
analytics/dbt/ dbt mart: mart_station_availability_hourly
api/ FastAPI read-only API over DuckDB

How to run

1. Start the pipeline

docker compose up -d

This starts Postgres, Kafka, Debezium, Flink, the GBFS ingestor, and the DuckDB consumer.

2. Register the Debezium connector

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @connectors/debezium-postgres.json

3. Submit the Flink SQL job

docker compose exec jobmanager ./bin/sql-client.sh -f /opt/flink/sql/station_availability_1min.sql

4. Run dbt (after data lands in DuckDB)

cd analytics/dbt
dbt run

5. Start the API

cd api
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Example outputs

$ curl localhost:8000/metrics/stations/hourly?limit=2
[
  {
    "station_id": "66dd5a42-0aca-11e7-82f6-3863bb44ef7c",
    "hour_ts": "2026-04-03T10:00:00",
    "avg_bikes_available": 18.0,
    "avg_docks_available": 2.0,
    "avg_capacity": 22.0,
    "avg_availability_ratio": 0.818,
    "min_availability_ratio": 0.818,
    "total_low_availability_events": 0,
    "total_events": 8
  }
]
$ curl localhost:8000/system/freshness
{
  "latest_emitted_at": "2026-04-03T10:40:00",
  "latest_mart_hour_ts": "2026-04-03T10:00:00",
  "checked_at": "2026-04-03T10:40:55.494591+00:00",
  "raw_lag_seconds": 55
}

Design trade-offs

  • Streaming correctness — Flink watermarks and tumbling windows produce deterministic 1-minute aggregates from CDC events
  • Batch/serving parity — the same DuckDB mart backs both dbt analysis and API serving
  • Production-style data contracts — each layer (raw → staging → mart → API) has a clear schema boundary

About

P1 — Control Tower for real-time mobility decisioning system (stream processing + operational surface)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors