A monitoring system built with Go v1.26.
The OpenAPI specification is located under openapi
Spin up containers
make docker/upRun REST API
make go/rest/runRun migrations
make db/migrate/upDestroy containers
make docker/downStart minikube and create the cerberus namespace:
make k8s/setupBuild the application and migration Docker images inside minikube's Docker daemon:
make k8s/buildDeploy all services (PostgreSQL, Vault, Tempo, Grafana, App) via Helm:
make k8s/deploymake k8s/statusUse minikube service to access NodePort services:
minikube service cerberus-app -n cerberus
minikube service cerberus-grafana -n cerberusmake k8s/undeployEach service has its own Helm chart under .kubernetes/:
| Chart | Service | Ports |
|---|---|---|
postgres/ |
PostgreSQL 17.5 | 5432 |
vault/ |
Hashicorp Vault 1.21 | 8200 |
tempo/ |
Grafana Tempo | 3200, 4317, 4318 |
grafana/ |
Grafana 11.6.0 | 3000 |
app/ |
Cerberus REST API | 4000, 4010 |
All services run in the cerberus namespace
Cerberus ships with a full OpenTelemetry stack: the application exports traces and metrics to an OTel Collector, which fans them out to Tempo (traces) and Prometheus (metrics). Grafana provides a unified UI over both.
| Component | Role | Local port |
|---|---|---|
| OTel Collector | Receives OTLP from the app, routes to backends | 4317 (gRPC), 4318 (HTTP) |
| Grafana Tempo | Distributed trace storage & query | 3200 |
| Prometheus | Metrics storage & query | 9090 |
| Grafana | Dashboards, trace & metric exploration | 3000 |
The application must point at the collector. In config.yaml:
collector:
host: "localhost:4317"
probability: 1.0 # sample rate — lower in production (e.g. 0.05)
metricInterval: "30s"Grafana — http://localhost:3000
Grafana is pre-configured with two datasources (no login required in the default dev setup):
| Datasource | UID | What it shows |
|---|---|---|
| Prometheus | prometheus |
HTTP request rates, durations, cache hit/miss, active requests |
| Tempo | tempo |
Distributed traces, per-request spans |
Explore traces
- Open Explore (compass icon in the left sidebar).
- Select the Tempo datasource.
- Use Search to filter by service name (
cerberus), HTTP method, status code, or trace duration. - Click any trace to open the flame graph and see every span — HTTP server, database queries, cache lookups.
Explore metrics
- Open Explore and select the Prometheus datasource.
- Useful metric names to start with:
| Metric | Description |
|---|---|
cerberus_http_server_request_total |
Total HTTP requests (by method, path, status) |
cerberus_http_server_request_duration_seconds |
Request latency histogram |
cerberus_http_server_active_requests |
In-flight requests gauge |
cerberus_cache_hit_total |
In-memory (L1) cache hits |
cerberus_cache_miss_total |
In-memory (L1) cache misses |
cerberus_cache_distributed_hit_total |
Redis (L2) cache hits |
cerberus_cache_distributed_miss_total |
Redis (L2) cache misses |
cerberus_cache_size |
Current number of entries in the in-memory cache |
cerberus_http_client_request_total |
Outgoing HTTP requests to downstream services |
Example PromQL — HTTP error rate over the last 5 minutes:
sum(rate(cerberus_http_server_request_total{status=~"5.."}[5m]))
/
sum(rate(cerberus_http_server_request_total[5m]))
Example PromQL — p99 request latency:
histogram_quantile(0.99,
sum by (le) (rate(cerberus_http_server_request_duration_seconds_bucket[5m]))
)
Correlating traces and metrics
Grafana links traces to metrics automatically when both datasources are configured. In a Prometheus panel, click a data point and choose View in Tempo to jump directly to traces from that time window.
Prometheus — http://localhost:9090
Use the Prometheus UI to run ad-hoc PromQL queries or check scrape targets:
- Status → Targets — confirms the OTel Collector scrape is
UP. - Graph — run any PromQL expression directly.
Tempo — http://localhost:3200
Tempo exposes an HTTP API for direct trace lookup when needed:
# Fetch a trace by ID
curl http://localhost:3200/api/traces/<trace-id>