Setting up Rook Ceph and Grafana Cloud to scrape Ceph mgr metrics

Intro

Let's deploy Rook Ceph in kubernetes ditribution using vagrant or k3d and setup default Grafana dashboard using Grafana Cloud.

Prerequisities

For vagrant: install vagrant, virtualbox (6.1.x. - newest compatible with vagrant) and kubectl.

For k3d: install k3d and kubectl.

Create environment

Vagrant option

For configuration of lightweight, reproducible, and portable development environments it is handy to use Hashicorp Vagrant.

Vagrant config:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.box = "generic/ubuntu2204"
  config.vm.network "private_network", ip: "192.168.56.10"
  config.vm.disk :disk, size: "10GB", name: "osd"
  config.vm.provider "virtualbox" do |vb|
    vb.memory = "8192"
  end
  config.vm.provision "shell", inline: <<-SHELL
    curl https://get.k3s.io | sh
    sudo cp /etc/rancher/k3s/k3s.yaml /vagrant/kubeconfig.k3s
    sed -i "s/127.0.0.1/192.168.56.10/" /vagrant/kubeconfig.k3s
  SHELL
end

Vagrant config specification:

Ubuntu 22.04
Attach 10GB virtual disk for creating 1 Ceph OSD
8GB of memory (can be configured less)
Installing k3s inside vm
Copy config file generated by K3s at /etc/rancher/k3s/k3s.yaml to the Vagrant shared directory at /vagrant/kubeconfig.k3s

Run Vagrant:

VAGRANT_EXPERIMENTAL="disks" vagrant up

Export KUBECONFIG:

export KUBECONFIG=$PWD/kubeconfig.k3s

Test cluster connection:

kubectl get nodes

NAME      STATUS   ROLES                  AGE   VERSION
vagrant   Ready    control-plane,master   12s   v1.26.4+k3s1

K3D option

k3d is a lightweight wrapper to run k3s (Rancher Lab’s minimal Kubernetes distribution) in docker.

For creating Ceph OSD using k3d it is possible to create loop device.

Create 10G .img file:

truncate -s 10G mydisk.img

Setup loop device:

losetup /dev/loop6 mydisk.img

For specific path define /dev/loopX path. For random path: losetup mydisk.img

List mounts:

losetup -a

Create k3d cluster:

k3d cluster create

List running cluster:

k3d cluster ls

Test cluster connection:

kubectl get nodes

NAME                       STATUS   ROLES                  AGE   VERSION
k3d-k3s-default-agent-0    Ready    <none>                 53m   v1.26.4+k3s1
k3d-k3s-default-agent-1    Ready    <none>                 53m   v1.26.4+k3s1
k3d-k3s-default-server-0   Ready    control-plane,master   53m   v1.26.4+k3s1

Deploy Rook Ceph

Clone repo:

git clone https://github.com/rook/rook.git

Deploy CRDs, common resources and Ceph Rook operator:

kubectl create \
-f rook/deploy/examples/crds.yaml \
-f rook/deploy/examples/common.yaml \
-f rook/deploy/examples/operator.yaml

!!! For k3d cluster with loop devices in operator.yaml set ROOK_CEPH_ALLOW_LOOP_DEVICES: "true"

Test if creating resources was succesfull:

kubectl get pod -n rook-ceph

NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-operator-cf4f7dfd4-d5dgt   1/1     Running   0          69s

Deploy Rook Ceph test cluster:

kubectl create -f rook/deploy/examples/cluster-test.yaml

!!! For unatended surprice please in cluster-test.yaml set useAllDevices: false

!!!From docs: useAllDevices: true or false, indicating whether all devices found on nodes in the cluster should be automatically consumed by OSDs. Not recommended unless you have a very controlled environment where you will not risk formatting of devices with existing data. When true, all devices will be used except those with partitions created or a local filesystem. Is overridden by deviceFilter if specified.

!!! For k3d cluster with loop devices in cluster-test.yaml set:

  storage:
    useAllNodes: true
    useAllDevices: false
    devices:
      - name: /dev/loop6  #define correct path

Test if creating resources was succesfull:

kubectl get pod -n rook-ceph

NAME                                            READY   STATUS      RESTARTS   AGE
rook-ceph-operator-cf4f7dfd4-5njnl              1/1     Running     0          5m29s
csi-cephfsplugin-zjwvw                          2/2     Running     0          3m50s
csi-rbdplugin-mxkvr                             2/2     Running     0          3m50s
csi-cephfsplugin-provisioner-84cc595b78-vc6nd   5/5     Running     0          3m50s
csi-rbdplugin-provisioner-6f6b6b8cd6-bb59v      5/5     Running     0          3m50s
rook-ceph-mon-a-67944c86cd-wqgnx                1/1     Running     0          3m55s
rook-ceph-mgr-a-5fc69f54dc-gwtgz                1/1     Running     0          3m31s
rook-ceph-osd-prepare-vagrant-vd7g6             0/1     Completed   0          3m10s
rook-ceph-osd-0-6ccf6f55c6-sxgln                1/1     Running     0          3m2s

Let's port forward Ceph mgr and check if metrics are present:

kubectl port-forward svc/rook-ceph-mgr 9283:9283 -n rook-ceph

See metrics at http://localhost:9283/metrics

Try interactive toolbox for connecting to the ceph cluster:

kubectl create -f rook/deploy/examples/toolbox.yaml
kubectl -n rook-ceph rollout status deploy/rook-ceph-tools
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
ceph status

cluster:
id:     a7f7141c-127f-4a9d-a8d3-25fab107cb47
health: HEALTH_OK

services:
mon: 1 daemons, quorum a (age 9m)
mgr: a(active, since 8m)
osd: 1 osds: 1 up (since 8m), 1 in (since 8m)

data:
pools:   1 pools, 32 pgs
objects: 2 objects, 463 KiB
usage:   21 MiB used, 10 GiB / 10 GiB avail
pgs:     32 active+clean

Setup Grafana Cloud

For the ease of setup let's scrape metrics with Grafana agent. Please deploy Grafana Agent to kubernetes as it is mentioned in instructions. Instructions in Grafana Cloud UI will also guide you with exact commands (including access token) how to deploy Grafana agent in kubernetes.

Find Grafana integration Ceph. Read throught installation guide. Prometheus module in Ceph is already enabled so what is needed is just editing the ConfigMap.

Edit grafana-agent ConfigMap:

kubectl edit configmap grafana-agent -n default

Add new scrape job:

        - job_name: integrations/ceph
          static_configs:
            - targets: ['mgr_ip_adress:9283']
              labels:
                rook_cluster: 'rook-ceph'

Grafana Ceph integration instruct to add label ceph_cluster: 'my-cluster' but Rook Ceph in examples assign rook_cluster: 'rook-ceph' label so if using Rook Ceph defaults please change the label.

Restart grafana-agent trought StatefulSets to apply change in ConfigMap:

kubectl rollout restart statefulset grafana-agent -n default

Navigate to Grafana Ceph-Cluster dashboard, select right Prometheus data source and cluster (cluster: my-cluster). cluster label is from grafana-agent ConfigMap defined as external_labels.

simonliska/rook-ceph-mgr-grafana-cloud.md