Skip to content

Instantly share code, notes, and snippets.

@cs224
Created February 17, 2026 07:05
Show Gist options
  • Select an option

  • Save cs224/feaa4c2b1b28a0004f1f873d84bfc454 to your computer and use it in GitHub Desktop.

Select an option

Save cs224/feaa4c2b1b28a0004f1f873d84bfc454 to your computer and use it in GitHub Desktop.
Analysing Cloud-Init on a VPS with `cloud-init-ctx.sh`

Analysing Cloud-Init on a VPS with cloud-init-ctx.sh

When cloud-init behavior is confusing on a VPS, I want a repeatable way to capture the full state in one file. The helper script cloud-init-ctx.sh does exactly that.

Run it from your workstation:

./cloud-init-ctx.sh <ssh-target>
# example:
./cloud-init-ctx.sh root@94.143.231.195

The SSH target should log in as root, because the script reads privileged cloud-init state and logs.

It writes:

<target>.cloud-init-ctx.txt

What the script does

  1. Verifies local prereqs and SSH connectivity (BatchMode=yes).
  2. Probes the remote host for OS, privilege level, and command availability.
  3. Collects cloud-init context in one SSH session.
  4. Writes a structured report with clear section markers.

This makes debugging easier because config, runtime state, and logs end up in one place.

Key analysis commands it runs

The script runs many commands, but these are the high-signal ones.

  • Cloud-init health: cloud-init --version
  • Cloud-init health: cloud-init status --long
  • Cloud-init identity: cloud-init query instance_id
  • Cloud-init keys: cloud-init query --list-keys
  • Cloud-init merged data: cloud-init query --all
  • Cloud-init timing: cloud-init analyze show
  • Cloud-init timing: cloud-init analyze blame
  • Effective config: cat /etc/cloud/cloud.cfg
  • Effective config: cat /etc/cloud/cloud.cfg.d/*
  • Result status: cat /var/lib/cloud/data/status.json
  • Result status: cat /var/lib/cloud/data/result.json
  • Seed evidence: cat /var/lib/cloud/seed/nocloud-net/*
  • Active instance symlink: readlink -f /var/lib/cloud/instance
  • Resolved user-data: cat /var/lib/cloud/instances/*/user-data.txt
  • Datasource evidence: cat /var/lib/cloud/instances/*/datasource
  • Runtime data: cat /run/cloud-init/status.json
  • Runtime data: cat /run/cloud-init/instance-data.json
  • Runtime data: cat /run/cloud-init/instance-data-sensitive.json
  • Systemd wiring: systemctl list-unit-files 'cloud-init*'
  • Systemd state: systemctl status cloud-init-local.service cloud-init.service cloud-config.service cloud-final.service
  • Network state: ip addr show
  • Network state: ip route show
  • Network state: ip -6 route show
  • Resolver state: cat /etc/resolv.conf
  • Main log: tail -n 2000 /var/log/cloud-init.log
  • Output log: tail -n 2000 /var/log/cloud-init-output.log
  • Journal evidence: journalctl -u 'cloud-init*' -b --no-pager | tail -n 2000

Reading the output quickly

A fast triage loop:

  1. CLOUD-INIT: status --long to confirm whether cloud-init finished and with which datasource.
  2. CLOUD-INIT: analyze blame to spot slow modules.
  3. FILE: /var/lib/cloud/data/result.json to check errors/recoverable_errors.
  4. INSTANCE: enumerate /var/lib/cloud/instances/* to inspect resolved user-data and module semaphores.
  5. TAIL ... /var/log/cloud-init.log and JOURNAL: cloud-init units for final root-cause evidence.

In the example capture (avoro2-root.cloud-init-ctx.txt), the host reports status: done and DataSourceNoCloud [seed=/dev/sr0], which is exactly the kind of confirmation you want early.

Safety note

The report may contain secrets or sensitive bootstrap data. Treat *.cloud-init-ctx.txt files like credentials and avoid posting them publicly without redaction.

#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'EOF'
Usage:
cloud-init-ctx.sh <ssh-target>
Examples:
cloud-init-ctx.sh avoro2-root
cloud-init-ctx.sh root@94.143.231.195
cloud-init-ctx.sh user@host.example.com
This script connects via SSH and writes a local context file:
<target>.cloud-init-ctx.txt
Expected target access:
- SSH login should land you on the target host as root.
- Example: root@host.example.com
EOF
}
if [[ $# -ne 1 ]]; then
usage
exit 2
fi
TARGET="$1"
# Sanitize for filename while keeping it recognizable.
SAFE_TARGET="$TARGET"
SAFE_TARGET="${SAFE_TARGET//@/_}"
SAFE_TARGET="${SAFE_TARGET//\//_}"
SAFE_TARGET="${SAFE_TARGET//:/_}"
SAFE_TARGET="${SAFE_TARGET// /_}"
SAFE_TARGET="$(printf '%s' "$SAFE_TARGET" | tr -cd 'A-Za-z0-9._-')"
OUTFILE="${SAFE_TARGET}.cloud-init-ctx.txt"
# Local prereqs (minimal; should exist on normal Linux).
LOCAL_REQUIRED=(ssh)
missing_local=()
for c in "${LOCAL_REQUIRED[@]}"; do
command -v "$c" >/dev/null 2>&1 || missing_local+=("$c")
done
if [[ ${#missing_local[@]} -gt 0 ]]; then
printf 'Error: missing required local commands: %s\n' "${missing_local[*]}" >&2
exit 1
fi
SSH_OPTS=(
-o BatchMode=yes
-o ConnectTimeout=10
)
# Helper: run a command remotely (non-interactive).
ssh_remote() {
ssh "${SSH_OPTS[@]}" "$TARGET" "$@"
}
# 1) Connectivity check.
if ! ssh_remote "true" >/dev/null 2>&1; then
cat >&2 <<EOF
Error: unable to connect to '$TARGET' via SSH in BatchMode.
Troubleshooting:
- Ensure you can run: ssh $TARGET
- Ensure host key is accepted (first connect may require interaction)
- Ensure your key agent / IdentityFile is configured for that host
EOF
exit 1
fi
# 2) Probe remote OS + command availability in ONE go.
PROBE_OUT="$(ssh_remote 'sh -s' <<'REMOTE_PROBE'
set -eu
# OS detection (best-effort)
OS_ID="unknown"
OS_LIKE=""
if [ -r /etc/os-release ]; then
# shellcheck disable=SC1091
. /etc/os-release
OS_ID="${ID:-unknown}"
OS_LIKE="${ID_LIKE:-}"
fi
# Privilege detection
UID_NOW="$(id -u 2>/dev/null || echo 99999)"
# Required commands (core to collecting cloud-init context)
REQUIRED="sh uname date cat ls find grep sed awk readlink stat cloud-init"
# Optional commands (nice-to-have; script will degrade gracefully without them)
OPTIONAL_COMMON="systemctl journalctl ip hostnamectl systemd-detect-virt resolvectl netplan ifquery tree"
OPTIONAL_DEB="dpkg-query"
OPTIONAL_RPM="rpm"
OPTIONAL="$OPTIONAL_COMMON"
case " $OS_ID $OS_LIKE " in
*" debian "*|*" ubuntu "*)
OPTIONAL="$OPTIONAL $OPTIONAL_DEB"
;;
*" rhel "*|*" centos "*|*" fedora "*|*" rocky "*|*" alma "*|*" suse "*|*" sles "*)
OPTIONAL="$OPTIONAL $OPTIONAL_RPM"
;;
*)
OPTIONAL="$OPTIONAL $OPTIONAL_DEB $OPTIONAL_RPM"
;;
esac
missing_required=""
for c in $REQUIRED; do
if ! command -v "$c" >/dev/null 2>&1; then
missing_required="$missing_required $c"
fi
done
missing_optional=""
for c in $OPTIONAL; do
if ! command -v "$c" >/dev/null 2>&1; then
missing_optional="$missing_optional $c"
fi
done
printf '__OS_ID__=%s\n' "$OS_ID"
printf '__OS_LIKE__=%s\n' "$OS_LIKE"
printf '__UID__=%s\n' "$UID_NOW"
printf '__MISSING_REQUIRED__=%s\n' "${missing_required# }"
printf '__MISSING_OPTIONAL__=%s\n' "${missing_optional# }"
REMOTE_PROBE
)"
# Parse probe output.
OS_ID="$(printf '%s\n' "$PROBE_OUT" | awk -F= '/^__OS_ID__=/{print $2}')"
OS_LIKE="$(printf '%s\n' "$PROBE_OUT" | awk -F= '/^__OS_LIKE__=/{print $2}')"
REMOTE_UID="$(printf '%s\n' "$PROBE_OUT" | awk -F= '/^__UID__=/{print $2}')"
MISSING_REQUIRED="$(printf '%s\n' "$PROBE_OUT" | awk -F= '/^__MISSING_REQUIRED__=/{print $2}')"
MISSING_OPTIONAL="$(printf '%s\n' "$PROBE_OUT" | awk -F= '/^__MISSING_OPTIONAL__=/{print $2}')"
if [[ "$REMOTE_UID" != "0" ]]; then
cat >&2 <<EOF
Error: remote user for '$TARGET' is not root (uid=$REMOTE_UID).
This script expects root access on the target host so it can read
cloud-init state, logs, and system files without partial output.
Use a root target, for example:
cloud-init-ctx.sh root@host.example.com
EOF
exit 1
fi
if [[ -n "${MISSING_REQUIRED// }" ]]; then
echo "Remote '$TARGET' is missing required commands:" >&2
echo " $MISSING_REQUIRED" >&2
echo >&2
# Best-effort install hint (Debian/Ubuntu supported explicitly).
if [[ "$OS_ID" == "debian" || "$OS_ID" == "ubuntu" || "$OS_LIKE" == *"debian"* ]]; then
# Map commands -> packages (best-effort; may vary slightly by distro).
declare -A CMD2PKG=(
[cloud-init]=cloud-init
[find]=findutils
[grep]=grep
[sed]=sed
[awk]=mawk
[readlink]=coreutils
[stat]=coreutils
[uname]=coreutils
[date]=coreutils
[cat]=coreutils
[ls]=coreutils
[sh]=dash
)
pkgs=()
for cmd in $MISSING_REQUIRED; do
pkg="${CMD2PKG[$cmd]:-}"
if [[ -n "$pkg" ]]; then pkgs+=("$pkg"); else pkgs+=("$cmd"); fi
done
# de-duplicate
uniq_pkgs="$(printf "%s\n" "${pkgs[@]}" | awk '!seen[$0]++' | tr '\n' ' ')"
cat >&2 <<EOF
Install hint (Debian/Ubuntu):
sudo apt-get update
sudo apt-get install -y $uniq_pkgs
EOF
else
cat >&2 <<EOF
Install hint:
Install packages providing these commands using your distro package manager.
EOF
fi
exit 1
fi
# 3) Inform about optional missing commands (do not fail).
if [[ -n "${MISSING_OPTIONAL// }" ]]; then
cat >&2 <<EOF
Note: optional remote commands are missing (collection will degrade gracefully):
$MISSING_OPTIONAL
If you want fuller output, install as appropriate for your distro.
For Debian/Ubuntu, typical packages:
sudo apt-get install -y iproute2 systemd systemd-resolved netplan.io tree
EOF
fi
# 4) Collect context in one SSH session and write local file.
# Write header locally (includes capture time and target string).
{
echo "### cloud-init context capture"
echo "target: $TARGET"
echo "local_time_utc: $(date -u +"%Y-%m-%dT%H:%M:%SZ")"
echo "----------------------------------------"
} > "$OUTFILE"
# Stream remote collection into the local file.
ssh_remote "sh -s" <<'REMOTE_COLLECT' >> "$OUTFILE"
set -eu
section() {
echo
echo "===== $1 ====="
}
cmd_or_note() {
# Usage: cmd_or_note "title" "command"
title="$1"; shift
section "$title"
tmp="$(mktemp)"
if eval "$@" >"$tmp" 2>&1; then
cat "$tmp"
else
cat "$tmp"
echo "(command failed or not available): $*"
fi
rm -f "$tmp"
}
show_file() {
# Usage: show_file /path/to/file
f="$1"
section "FILE: $f"
if [ -e "$f" ]; then
ls -l "$f" 2>&1 || true
echo "---"
cat "$f" 2>&1 || echo "(unable to read)"
else
echo "(missing)"
fi
}
list_dir() {
# Usage: list_dir /path
d="$1"
section "DIR: $d"
if [ -d "$d" ]; then
ls -la "$d" 2>&1 || true
else
echo "(missing)"
fi
}
tail_file() {
# Usage: tail_file /path lines
f="$1"
n="$2"
section "TAIL ${n} lines: $f"
if [ -e "$f" ]; then
ls -l "$f" 2>&1 || true
echo "---"
tail -n "$n" "$f" 2>&1 || echo "(unable to read)"
else
echo "(missing)"
fi
}
# System identity / baseline
cmd_or_note "SYSTEM: identity" "echo \"remote_time: \$(date -Is 2>/dev/null || date)\"; echo \"uname: \$(uname -a 2>/dev/null || true)\"; echo \"id: \$(id 2>/dev/null || true)\"; echo \"hostname: \$(hostname 2>/dev/null || true)\""
show_file "/etc/os-release"
cmd_or_note "SYSTEM: virtualization (best-effort)" "command -v systemd-detect-virt >/dev/null 2>&1 && systemd-detect-virt -v || true"
# cloud-init versions and status
cmd_or_note "CLOUD-INIT: version" "cloud-init --version 2>&1 || true"
cmd_or_note "CLOUD-INIT: status --long" "cloud-init status --long 2>&1 || true"
cmd_or_note "CLOUD-INIT: query ds" "cloud-init query ds 2>&1 || true"
cmd_or_note "CLOUD-INIT: query instance_id" "cloud-init query instance_id 2>&1 || true"
cmd_or_note "CLOUD-INIT: query --all (may include sensitive data)" "cloud-init query --all 2>&1 || true"
cmd_or_note "CLOUD-INIT: analyze show (best-effort)" "cloud-init analyze show 2>&1 || true"
cmd_or_note "CLOUD-INIT: analyze blame (best-effort)" "cloud-init analyze blame 2>&1 || true"
cmd_or_note "CLOUD-INIT: query --list-keys" "cloud-init query --list-keys 2>&1 || true"
cmd_or_note "CLOUD-INIT: query userdata" "cloud-init query userdata 2>&1 || true"
# cloud-init config files
list_dir "/etc/cloud"
show_file "/etc/cloud/cloud.cfg"
section "FILES: /etc/cloud/cloud.cfg.d/*"
if [ -d /etc/cloud/cloud.cfg.d ]; then
ls -la /etc/cloud/cloud.cfg.d 2>/dev/null || true
for f in /etc/cloud/cloud.cfg.d/*; do
[ -f "$f" ] || continue
echo
echo "----- $f"
ls -l "$f" 2>&1 || true
echo "---"
cat "$f" 2>&1 || true
done
else
echo "(missing)"
fi
# cloud-init state directories
list_dir "/var/lib/cloud"
list_dir "/var/lib/cloud/data"
show_file "/var/lib/cloud/data/instance-id"
show_file "/var/lib/cloud/data/previous-datasource"
show_file "/var/lib/cloud/data/status.json"
show_file "/var/lib/cloud/data/result.json"
# Seed information (NoCloud seeded mode appears here)
list_dir "/var/lib/cloud/seed"
list_dir "/var/lib/cloud/seed/nocloud"
list_dir "/var/lib/cloud/seed/nocloud-net"
section "FILES: /var/lib/cloud/seed/nocloud-net/*"
if [ -d /var/lib/cloud/seed/nocloud-net ]; then
for f in /var/lib/cloud/seed/nocloud-net/*; do
[ -f "$f" ] || continue
echo
echo "----- $f"
ls -l "$f" 2>&1 || true
echo "---"
cat "$f" 2>&1 || true
done
else
echo "(missing)"
fi
# Per-instance cache (this exists for any datasource)
list_dir "/var/lib/cloud/instances"
section "INSTANCE: current symlink resolution"
if [ -L /var/lib/cloud/instance ] || [ -e /var/lib/cloud/instance ]; then
(readlink -f /var/lib/cloud/instance 2>/dev/null || true)
else
echo "(missing /var/lib/cloud/instance)"
fi
section "INSTANCE: enumerate /var/lib/cloud/instances/*"
if [ -d /var/lib/cloud/instances ]; then
for inst in /var/lib/cloud/instances/*; do
[ -d "$inst" ] || continue
echo
echo "----- INSTANCE DIR: $inst"
ls -la "$inst" 2>/dev/null || true
for f in \
"$inst/datasource" \
"$inst/cloud-config.txt" \
"$inst/user-data.txt" \
"$inst/vendor-data.txt" \
"$inst/vendor-cloud-config.txt" \
"$inst/vendor-data2.txt" \
"$inst/network-config.json" \
"$inst/obj.pkl" \
"$inst/boot-finished"
do
if [ -e "$f" ]; then
echo
echo "FILE: $f"
ls -l "$f" 2>/dev/null || true
echo "---"
# Avoid binary spam from obj.pkl by printing metadata only
case "$f" in
*.pkl)
echo "(binary pickle; not dumped)"
;;
*)
cat "$f" 2>/dev/null || echo "(unable to read)"
;;
esac
fi
done
# Semaphore markers show which modules ran once.
if [ -d "$inst/sem" ]; then
echo
echo "SEM: $inst/sem"
ls -la "$inst/sem" 2>/dev/null || true
fi
# Scripts left by cloud-init
if [ -d "$inst/scripts" ]; then
echo
echo "SCRIPTS: $inst/scripts"
find "$inst/scripts" -maxdepth 3 -type f -print 2>/dev/null || true
fi
done
else
echo "(missing)"
fi
# Runtime instance-data (helpful for datasource/network debugging)
list_dir "/run/cloud-init"
show_file "/run/cloud-init/status.json"
show_file "/run/cloud-init/instance-data.json"
show_file "/run/cloud-init/instance-data-sensitive.json"
# cloud-init related init/systemd artifacts
section "INIT/SYSTEMD: cloud-init units and enablement (best-effort)"
if command -v systemctl >/dev/null 2>&1; then
systemctl list-unit-files 'cloud-init*' 2>&1 || true
echo
systemctl status cloud-init-local.service cloud-init.service cloud-config.service cloud-final.service cloud-init-network.service cloud-init-main.service 2>&1 || true
else
echo "(systemctl not available; listing legacy init scripts if present)"
ls -la /etc/init.d/cloud-init* 2>/dev/null || true
find /etc/rc*.d -maxdepth 1 -name '*cloud-init*' -print 2>/dev/null || true
fi
section "ETC: files matching *cloud-init*"
find /etc -name "*cloud-init*" -print 2>/dev/null || true
# Networking: rendered config & live state
section "NETWORK: rendered config files (common locations)"
# ifupdown ENI
show_file "/etc/network/interfaces"
section "FILES: /etc/network/interfaces.d/*"
if [ -d /etc/network/interfaces.d ]; then
ls -la /etc/network/interfaces.d 2>/dev/null || true
for f in /etc/network/interfaces.d/*; do
[ -f "$f" ] || continue
echo
echo "----- $f"
ls -l "$f" 2>/dev/null || true
echo "---"
cat "$f" 2>/dev/null || true
done
else
echo "(missing)"
fi
# systemd-networkd
section "FILES: /etc/systemd/network/*"
if [ -d /etc/systemd/network ]; then
ls -la /etc/systemd/network 2>/dev/null || true
for f in /etc/systemd/network/*.network /etc/systemd/network/*cloud-init*; do
[ -f "$f" ] || continue
echo
echo "----- $f"
ls -l "$f" 2>/dev/null || true
echo "---"
cat "$f" 2>/dev/null || true
done
else
echo "(missing)"
fi
# netplan (mostly Ubuntu; sometimes present)
section "FILES: /etc/netplan/*"
if [ -d /etc/netplan ]; then
ls -la /etc/netplan 2>/dev/null || true
for f in /etc/netplan/*; do
[ -f "$f" ] || continue
echo
echo "----- $f"
ls -l "$f" 2>/dev/null || true
echo "---"
cat "$f" 2>/dev/null || true
done
else
echo "(missing)"
fi
# Live state
cmd_or_note "NETWORK: ip addr (best-effort)" "command -v ip >/dev/null 2>&1 && ip addr show 2>&1 || true"
cmd_or_note "NETWORK: ip route (best-effort)" "command -v ip >/dev/null 2>&1 && ip route show 2>&1 || true"
cmd_or_note "NETWORK: ip -6 route (best-effort)" "command -v ip >/dev/null 2>&1 && ip -6 route show 2>&1 || true"
show_file "/etc/resolv.conf"
cmd_or_note "NETWORK: resolvectl status (best-effort)" "command -v resolvectl >/dev/null 2>&1 && resolvectl status 2>&1 || true"
# Logs (bounded)
tail_file "/var/log/cloud-init.log" 2000
tail_file "/var/log/cloud-init-output.log" 2000
section "JOURNAL: cloud-init units (best-effort; bounded)"
if command -v journalctl >/dev/null 2>&1; then
journalctl -u 'cloud-init*' -b --no-pager 2>&1 | tail -n 2000 || true
else
echo "(journalctl not available)"
fi
section "END OF CLOUD-INIT CONTEXT"
REMOTE_COLLECT
echo "Wrote context file: $OUTFILE"
echo "Warning: the output may include sensitive data (user-data, vendor-data, instance-data). Store it securely."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment