GUIDELatency details: percentiles, distribution and SLOs
For latency metrics you need Prometheus histograms:
http_request_duration_seconds (standard OpenMetrics)api_latency_milliseconds)method, endpoint, statusThe $rate_interval variable controls the time window used in rate() and histogram_quantile() queries. The default is 5m.
15m or 30m to get stable results instead of spiky gaps.rate_interval → edit the default value."No data" in latency panels is expected behavior when there is no active HTTP traffic hitting your services — it is not a configuration error. The panels will populate as soon as requests start flowing.
┌────────────────────────────────────────┐ │ LATENCY — PERCENTILES & SLOs │ ├────────────────────────────────────────┤ │ P50 │ P95 │ P99 │ │ (stat) │ (stat) │ (stat) │ ├────────────────────────────────────────┤ │ Distribution Time Series [12 cols] │ │ (lines: P50, P95, P99) │ ├────────────────────────────────────────┤ │ Heatmap [6 cols] │ Table by Endpoint │ │ (buckets time) │ (top handlers) │ ├────────────────────────────────────────┤ │ SLO Alert — P99 < 500ms [12 cols] │ └────────────────────────────────────────┘
Three stat panels showing percentiles in milliseconds.
Query P50:histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m])) * 1000Query P95:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) * 1000Query P99:
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) * 1000
For custom metric (e.g: already in ms):
histogram_quantile(0.95, rate(my_api_latency_ms_bucket[5m]))
Change thresholds: Panel → Thresholds → Adjust limits (e.g: 100 yellow, 200 red)
Line chart showing evolution of P50, P95, P99 over time.
Add percentiles: In Legend/Aliases, customize labels:
| Percentile | Query | Label |
|---|---|---|
| P25 | histogram_quantile(0.25, ...) |
P25 |
| P50 | histogram_quantile(0.50, ...) |
P50 (Median) |
| P90 | histogram_quantile(0.90, ...) |
P90 |
| P95 | histogram_quantile(0.95, ...) |
P95 |
| P99 | histogram_quantile(0.99, ...) |
P99 |
Change rate window: Edit [5m] → [1m] for more sensitivity or [15m] to smooth
Visualization of latency distribution by buckets over time.
Default query (all buckets):rate(http_request_duration_seconds_bucket[5m])
To resolve specific buckets: Filter by bucket values:
rate(http_request_duration_seconds_bucket{le=~"0.01|0.05|0.1|0.5|1"}[5m])
Adjust resolution: Panel → Heatmap options → Cell size, change Bucket to discrete vs. continuous
Dynamic table with latency by endpoint/handler.
Base query:topk(10, sum by (endpoint) (rate(http_request_duration_seconds_sum[5m]))) / topk(10, sum by (endpoint) (rate(http_request_duration_seconds_count[5m]))) * 1000
Change top N endpoints: Replace topk(10 with topk(20 or topk(5
Filter by method:
topk(10, sum by (endpoint, method) (rate(http_request_duration_seconds_sum{method="GET"}[5m])))
/ topk(10, sum by (endpoint, method) (rate(http_request_duration_seconds_count{method="GET"}[5m]))) * 1000
Stat or gauge that alerts if P99 exceeds the SLO.
Default SLO (500ms):histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) * 1000 < 500
Change SLO to 300ms:
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) * 1000 < 300
For alert by endpoint:
histogram_quantile(0.99, sum by (endpoint) (rate(http_request_duration_seconds_bucket[5m]))) * 1000 < 250
Alert color: In Thresholds, configure:
| Theme | Primary | Secondary | OK Color |
|---|---|---|---|
| GREEN | #33FF00 |
#22BB00 |
#33FF00 |
| AMBER | #FFB000 |
#CC8C00 |
#FFB000 |
| BLUE | #00BFFF |
#0099CC |
#00BFFF |
| Type | Width (cols) | Height (rows) |
|---|---|---|
| Mobile (< 768px) | 6 (stack vertical) | 8 |
| Tablet 10" | 12 (full width) | 10 |
| Tablet 12.9" | 12 (full width) | 12 |
| Desktop 1920x1080 | 24 (2 columns) | 8 |
$GRAFANA_URL/api/dashboards/db/latencyhistogram_quantile(0.95,
rate(http_request_duration_seconds_bucket{status!~"5.."}[5m])) * 1000
This shows P95 only for successful requests (excluding 5xx).
histogram_quantile(0.95, sum by (instance) (rate(http_request_duration_seconds_bucket[5m]))) * 1000
Useful if you have multiple servers and want to detect instance anomalies.
rate(increase(http_request_duration_seconds_bucket{le="1"}[5m]))[10m:1m]
Shows latency trend over the last 10 minutes with 1m resolution.