GUIDEOverview dashboard with summary of the 4 Golden Signals
For this dashboard to work correctly you need:
Prometheus datasource configured in Grafananode_exporter for node metrics (CPU, memory, disk)kube-state-metrics for pod status and K8s resourcesup metric for health check (scrape_duration_seconds)These dashboards are tested with Grafana 12.3 and kube-prometheus-stack. To import: go to Dashboards → Import, paste the JSON, then select your Prometheus datasource from the dropdown.
job="node-exporter" is the default job label set by kube-prometheus-stack. Your setup may use job="node" or another value — adjust accordingly.$job_node variable (in the dashboard JSON) lets you switch between node-exporter and node without editing individual panel queries.$job_node variable value in Dashboard Settings → Variables.┌────────────────────────────────────────┐ │ OVERVIEW — 4 GOLDEN SIGNALS │ ├────────────────────────────────────────┤ │ Health │ Avg │ Error │ CPU │ │ Score │ Latency │ Rate │ Satur. │ ├──────────┼──────────┼─────────┼────────┤ │ Signals Summary (4 KPIs) [6 cols] │ ├────────────────────────────────────────┤ │ Node Status Grid │ Pod Status │ │ (table) │ (table) │ ├────────────────────────────────────────┤ │ Top Resource Consumers [12 cols] │ └────────────────────────────────────────┘
Shows the percentage of "up" targets in Prometheus.
Default query:100 * count(up == 1) / count(up)
To change: If you use a custom exporter, replace up with your metric:
100 * count(my_service_healthy == 1) / count(my_service_healthy)
Thresholds: Edit Thresholds in panel (default: 90 yellow, 75 red)
Average request latency in milliseconds.
avg(rate(http_request_duration_seconds_sum[5m])) / avg(rate(http_request_duration_seconds_count[5m])) * 1000
Change metric: If you don't use Prometheus HTTP conventions, use:
avg(my_request_latency_ms)
Percentage of requests with 5xx status code.
100 * sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
Customization: To include 4xx errors:
100 * sum(rate(http_requests_total{status=~"[45].."}[5m]))
/ sum(rate(http_requests_total[5m]))
CPU usage as a percentage.
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Filter by specific instance:
100 - (avg(rate(node_cpu_seconds_total{mode="idle",instance="server1:9100"}[5m])) * 100)
Table with 4 main KPIs. To customize:
http_request_duration_seconds to your metric5m → 1m for more sensitivity)Dynamic table with node status.
max by (instance) (up{job="node"})
Change columns: In panel → Columns, add:
node_cpu_seconds_total → Total CPUnode_memory_MemAvailable_bytes → Available memorynode_filesystem_avail_bytes → Disk spaceTable of pods with status.
max by (pod, namespace) (kube_pod_status_phase)
Filter by namespace: In variables, create selector:
label_values(kube_pod_info, namespace)
Then use in the query:
max by (pod, namespace) (kube_pod_status_phase{namespace="$namespace"})
Ranking of pods by CPU/memory.
topk(10, sum by (pod, namespace) (rate(container_cpu_usage_seconds_total[5m])) * 100)
Change top N: Replace topk(10 with topk(20 for top 20
This dashboard includes 3 predefined themes. Use the buttons above or edit JSON:
| Theme | Primary | Glow | Background |
|---|---|---|---|
| GREEN | #33FF00 |
rgba(51,255,0,0.5) | #0A1A0A |
| AMBER | #FFB000 |
rgba(255,176,0,0.5) | #0D1117 |
| BLUE | #00BFFF |
rgba(0,191,255,0.5) | #0A0A1A |
| Device | Resolution | GridPos Height |
|---|---|---|
| iPad Pro 12.9 | 2048×2732 | 12-14 |
| iPad Air 10.9 | 1640x2360 | 10-12 |
| Samsung Tab S9 | 1752x2800 | 11-13 |
| Google Pixel Tablet | 1600x2560 | 10-12 |
| Desktop 1920x1080 | Full | 8-10 |
To adjust: In Grafana, edit each panel → Panel tab → Panel options → modify gridPos.h