RetroDash // LATENCY

GUIDELatency details: percentiles, distribution and SLOs

Theme

Requirements

For latency metrics you need Prometheus histograms:

Compatibility Notes

Rate interval and homelab traffic

The $rate_interval variable controls the time window used in rate() and histogram_quantile() queries. The default is 5m.

"No data" in latency panels is expected behavior when there is no active HTTP traffic hitting your services — it is not a configuration error. The panels will populate as soon as requests start flowing.

Panel Layout

┌────────────────────────────────────────┐
│    LATENCY — PERCENTILES & SLOs        │
├────────────────────────────────────────┤
│  P50      │  P95      │  P99           │
│  (stat)   │  (stat)   │  (stat)        │
├────────────────────────────────────────┤
│  Distribution Time Series [12 cols]    │
│  (lines: P50, P95, P99)                │
├────────────────────────────────────────┤
│  Heatmap [6 cols] │ Table by Endpoint  │
│  (buckets time)   │ (top handlers)     │
├────────────────────────────────────────┤
│  SLO Alert — P99 < 500ms [12 cols]    │
└────────────────────────────────────────┘

Panel Customization

P50 / P95 / P99 Stats

Three stat panels showing percentiles in milliseconds.

histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m])) * 1000
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) * 1000
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) * 1000

For custom metric (e.g: already in ms):

histogram_quantile(0.95, rate(my_api_latency_ms_bucket[5m]))

Change thresholds: Panel → Thresholds → Adjust limits (e.g: 100 yellow, 200 red)

Distribution Time Series

Line chart showing evolution of P50, P95, P99 over time.

Add percentiles: In Legend/Aliases, customize labels:

Percentile Query Label
P25 histogram_quantile(0.25, ...) P25
P50 histogram_quantile(0.50, ...) P50 (Median)
P90 histogram_quantile(0.90, ...) P90
P95 histogram_quantile(0.95, ...) P95
P99 histogram_quantile(0.99, ...) P99

Change rate window: Edit [5m][1m] for more sensitivity or [15m] to smooth

Heatmap

Visualization of latency distribution by buckets over time.

rate(http_request_duration_seconds_bucket[5m])

To resolve specific buckets: Filter by bucket values:

rate(http_request_duration_seconds_bucket{le=~"0.01|0.05|0.1|0.5|1"}[5m])

Adjust resolution: Panel → Heatmap options → Cell size, change Bucket to discrete vs. continuous

Table by Endpoint

Dynamic table with latency by endpoint/handler.

topk(10, sum by (endpoint) (rate(http_request_duration_seconds_sum[5m])))
/ topk(10, sum by (endpoint) (rate(http_request_duration_seconds_count[5m]))) * 1000

Change top N endpoints: Replace topk(10 with topk(20 or topk(5

Filter by method:

topk(10, sum by (endpoint, method) (rate(http_request_duration_seconds_sum{method="GET"}[5m])))
/ topk(10, sum by (endpoint, method) (rate(http_request_duration_seconds_count{method="GET"}[5m]))) * 1000

SLO Alert — P99 < 500ms

Stat or gauge that alerts if P99 exceeds the SLO.

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) * 1000 < 500

Change SLO to 300ms:

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) * 1000 < 300

For alert by endpoint:

histogram_quantile(0.99, sum by (endpoint) (rate(http_request_duration_seconds_bucket[5m]))) * 1000 < 250

Alert color: In Thresholds, configure:

Change Color Theme

Theme Primary Secondary OK Color
GREEN #33FF00 #22BB00 #33FF00
AMBER #FFB000 #CC8C00 #FFB000
BLUE #00BFFF #0099CC #00BFFF

Adapt to Your Resolution

Type Width (cols) Height (rows)
Mobile (< 768px) 6 (stack vertical) 8
Tablet 10" 12 (full width) 10
Tablet 12.9" 12 (full width) 12
Desktop 1920x1080 24 (2 columns) 8

Import in Grafana

  1. Export the LATENCY dashboard as JSON from Grafana
  2. Or copy the URI: $GRAFANA_URL/api/dashboards/db/latency
  3. Go to Dashboards → Import
  4. Paste the JSON or URL
  5. Make sure to select Prometheus datasource
  6. Verify the metrics in each panel
  7. Save and add to favorites

Advanced Tips

Correlate latency with errors

histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket{status!~"5.."}[5m])) * 1000

This shows P95 only for successful requests (excluding 5xx).

Breakdown by instance

histogram_quantile(0.95,
  sum by (instance) (rate(http_request_duration_seconds_bucket[5m]))) * 1000

Useful if you have multiple servers and want to detect instance anomalies.

Degradation detection

rate(increase(http_request_duration_seconds_bucket{le="1"}[5m]))[10m:1m]

Shows latency trend over the last 10 minutes with 1m resolution.