Skip to content

Instantly share code, notes, and snippets.

@barthez-kenwou
Created June 19, 2026 15:03
Show Gist options
  • Select an option

  • Save barthez-kenwou/19b5260701bb03cac9e7a49d2b4dbe46 to your computer and use it in GitHub Desktop.

Select an option

Save barthez-kenwou/19b5260701bb03cac9e7a49d2b4dbe46 to your computer and use it in GitHub Desktop.
Docker Monitoring Stack with Prometheus and Grafana

Docker Monitoring Stack with Prometheus and Grafana

Overview

Monitoring containerized workloads is essential for maintaining performance, reliability, and security.

This monitoring stack provides complete visibility into:

  • Docker containers
  • Host system resources
  • CPU usage
  • Memory consumption
  • Network activity
  • Disk utilization
  • Container health
  • Infrastructure availability

The solution is based on:

  • Prometheus
  • Grafana
  • cAdvisor
  • Node Exporter
  • Alertmanager

Architecture

+----------------------------------------------------+
|                    Docker Host                     |
+----------------------------------------------------+
           |                     |
           |                     |
           v                     v

+----------------+     +----------------+
| Node Exporter  |     |   cAdvisor     |
+----------------+     +----------------+
           |                     |
           +----------+----------+
                      |
                      v

              +---------------+
              | Prometheus    |
              +---------------+
                      |
          +-----------+-----------+
          |                       |
          v                       v

+----------------+      +----------------+
| Alertmanager   |      | Grafana        |
+----------------+      +----------------+

Features

  • Container monitoring
  • Host monitoring
  • Persistent storage
  • Docker metrics
  • Prometheus alerting
  • Grafana dashboards
  • Resource tracking
  • Production-ready
  • DevOps friendly

Directory Structure

docker-monitoring/

├── docker-compose.yml

├── prometheus/
│   └── prometheus.yml

├── alertmanager/
│   └── alertmanager.yml

├── grafana/
│   └── provisioning/

└── data/

Docker Compose

docker-compose.yml

services:

  prometheus:

    image: prom/prometheus:latest

    container_name: prometheus

    restart: unless-stopped

    ports:
      - "9090:9090"

    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 10s
      retries: 3

  grafana:

    image: grafana/grafana:latest

    container_name: grafana

    restart: unless-stopped

    ports:
      - "3000:3000"

    volumes:
      - grafana_data:/var/lib/grafana

    depends_on:
      - prometheus

  cadvisor:

    image: gcr.io/cadvisor/cadvisor:latest

    container_name: cadvisor

    restart: unless-stopped

    ports:
      - "8080:8080"

    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro

  node-exporter:

    image: prom/node-exporter:latest

    container_name: node-exporter

    restart: unless-stopped

    ports:
      - "9100:9100"

    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro

  alertmanager:

    image: prom/alertmanager:latest

    container_name: alertmanager

    restart: unless-stopped

    ports:
      - "9093:9093"

    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml

volumes:

  prometheus_data:

  grafana_data:

Prometheus Configuration

prometheus.yml

global:

  scrape_interval: 15s

rule_files:
  - alerts.yml

scrape_configs:

  - job_name: prometheus

    static_configs:
      - targets:
          - prometheus:9090

  - job_name: node-exporter

    static_configs:
      - targets:
          - node-exporter:9100

  - job_name: cadvisor

    static_configs:
      - targets:
          - cadvisor:8080

alerting:

  alertmanagers:

    - static_configs:
        - targets:
            - alertmanager:9093

Alertmanager Configuration

alertmanager.yml

route:

  receiver: default

receivers:

  - name: default

Docker Container Alert

alerts.yml

groups:

- name: docker-alerts

  rules:

  - alert: ContainerDown

    expr: up == 0

    for: 1m

    labels:
      severity: critical

    annotations:

      summary: "Container unavailable"

      description: "A monitored container is unreachable."

Starting the Stack

docker compose up -d

Verify:

docker ps

Access URLs

Prometheus:

http://localhost:9090

Grafana:

http://localhost:3000

Alertmanager:

http://localhost:9093

cAdvisor:

http://localhost:8080

Node Exporter:

http://localhost:9100/metrics

Useful Prometheus Queries

Container CPU Usage

rate(container_cpu_usage_seconds_total[5m])

Container Memory Usage

container_memory_usage_bytes

Container Network Traffic

rate(container_network_receive_bytes_total[5m])

Host CPU Usage

100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Host Memory Usage

(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/
node_memory_MemTotal_bytes
* 100

Recommended Grafana Dashboards

Import dashboard IDs:

Dashboard ID
Node Exporter Full 1860
Docker Monitoring 193
cAdvisor Exporter 14282

Security Considerations

Restrict Access

Never expose:

  • Prometheus
  • Grafana
  • Alertmanager

directly to the Internet.

Use:

  • Reverse Proxy
  • VPN
  • Authentication

Enable HTTPS

Recommended:

  • Caddy
  • Nginx
  • Traefik

Protect Grafana

Change default credentials immediately.

Default:

admin / admin

Persist Data

Always mount volumes.

Without persistence:

  • Metrics are lost
  • Dashboards disappear
  • Alert history is erased

Production Improvements

For enterprise environments, add:

  • Loki
  • Promtail
  • Tempo
  • OpenTelemetry
  • Blackbox Exporter
  • Pushgateway
  • Slack Alerts
  • Email Alerts
  • Microsoft Teams Notifications

Use Cases

This stack is suitable for:

  • Docker Hosts
  • VPS Monitoring
  • Home Labs
  • Kubernetes Nodes
  • Production Servers
  • CI/CD Infrastructure
  • Cloud Workloads

Final Thoughts

Monitoring is not optional in modern infrastructure.

Without observability, incidents become difficult to detect, diagnose, and resolve.

This monitoring stack provides a strong foundation for collecting metrics, visualizing system health, and proactively detecting failures before they impact users.

route:
receiver: default
receivers:
- name: default
groups:
- name: docker-alerts
rules:
- alert: ContainerDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Container unavailable"
description: "A monitored container is unreachable."
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
depends_on:
- prometheus
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
volumes:
prometheus_data:
grafana_data:
global:
scrape_interval: 15s
rule_files:
- alerts.yml
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- prometheus:9090
- job_name: node-exporter
static_configs:
- targets:
- node-exporter:9100
- job_name: cadvisor
static_configs:
- targets:
- cadvisor:8080
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment