Dgraph Observability Stack — Full Grafana LGTM Monitoring Example #9657
matthewmcneely
announced in
Announcements
Replies: 1 comment
-
|
This looks awesome! Going to have to upgrade and try it out. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Following the OpenTelemetry improvements introduced in v25.3, we've published a ready-to-run observability example in the dgraph-experimental repo that pairs a Dgraph cluster with the full Grafana LGTM stack (Loki, Grafana, Tempo, Prometheus) and OpenTelemetry. A single docker compose up gives you pre-built dashboards for cluster health, query/mutation throughput, transaction rates, Badger storage metrics, and Go runtime stats — all scraped via Prometheus. Traces flow from Dgraph through an OpenTelemetry Collector into Tempo, where you can search and visualize them with TraceQL. Dgraph v25.3 enhances the --trace flag with a new service option (--trace "jaeger=otel-collector:4318; ratio=1.0; service=alpha1;") that lets you assign a custom service name to each node while preserving the dgraph.alpha or dgraph.zero namespace, making it easy to distinguish individual nodes in a multi-node cluster within Grafana's trace explorer.
Also new in v25.3, the --feature-flags option now supports log-slow-query-threshold (e.g.,
--feature-flags "log-slow-query-threshold=500ms"), which emits structured JSON log entries for any query or mutation exceeding the configured duration. Each slow operation log includes the trace ID, span ID, timing breakdown (parsing, processing, encoding), and the query text — written to stderr and automatically collected by Promtail into Loki. Combined with Grafana's trace-to-logs correlation, you can click a slow trace span in Tempo and jump directly to the corresponding log entry in Loki, giving you end-to-end visibility from trace latency down to the exact query payload. Note: enabling this feature logs query text, so it should be used with care in environments with strict data privacy requirements.For the image below and those shown in the README in dgraph-experimental, here is an alpha start command showing the new --trace and --feature_flag params. Note three alphas managing separate groups were used to create the screenshots. They were loaded with the 1million movie dataset.
Beta Was this translation helpful? Give feedback.
All reactions