Skip to content

Instantly share code, notes, and snippets.

@sozercan
Created April 10, 2026 18:16
Show Gist options
  • Select an option

  • Save sozercan/54632f0505a2815a6e01acb429f75b2b to your computer and use it in GitHub Desktop.

Select an option

Save sozercan/54632f0505a2815a6e01acb429f75b2b to your computer and use it in GitHub Desktop.
eraser threat model

Overview

Eraser is a Kubernetes operator that cleans container images from nodes. It runs a controller-manager (main.go, controllers/*) that watches cluster-scoped CRDs (ImageList and ImageJob in api/v1) and manages per-node cleanup pods. It supports two modes: manual cleanup via ImageList (imagelist controller builds a ConfigMap with the supplied list and spawns remover pods), and automated cleanup via scheduled ImageJob runs (imagecollector controller creates a pod template with collector, optional scanner, and remover containers). The collector enumerates non-running images via the container runtime (pkg/collector, pkg/cri) and passes JSON lists over named pipes to the scanner/remover (pkg/utils, pkg/scanners/template). The remover deletes images from the CRI socket (pkg/remover). Configuration is provided via an EraserConfig YAML stored in a ConfigMap and mounted into the manager (api/unversioned, main.go). The manager watches the file with inotify and can restart if component enablement changes. Helm charts and kustomize manifests in config/ and charts/ define RBAC, service accounts, and deployment defaults. The manager exposes health/ready probes and metrics, and optional pprof servers for profiling.

Threat model, Trust boundaries and assumptions

Assets to protect

  • Integrity and availability of node image caches and the container runtime.
  • Cluster stability and availability (cleanup jobs can disrupt workloads).
  • Configuration integrity (EraserConfig, exclusion lists, component images).
  • Service account tokens and cluster RBAC permissions.
  • Scan results/metrics (may include node names or image identifiers).

Trust boundaries / attacker-controlled inputs

  • Kubernetes API objects if the attacker has RBAC: ImageList/ImageJob CRDs (api/v1), ConfigMaps labelled for exclusion lists, and the eraser-manager-config ConfigMap.
  • Image metadata coming from the CRI (image names, tags, digests) that can be influenced by workloads pulling images.
  • Network endpoints configured by operators (OTEL exporter, Trivy DB), which can be attacker-controlled if misconfigured.
  • Content written to the shared named pipes if any container in the cleanup pod is compromised (collector/scanner/remover).
  • Container images used for collector/scanner/remover if an attacker can supply or tamper with them.

Operator-controlled inputs

  • EraserConfig YAML (runtime socket path, scheduling, component images, extra volumes, node filters).
  • Helm values, kustomize overlays, and deployment manifests (RBAC, service accounts, labels, security contexts).
  • Pull secrets and additional pod labels (env var ERASER_PULL_SECRET_NAMES and config fields).
  • Namespace selection and network policies.

Developer-controlled inputs

  • Default configuration (api/*/config), build artifacts, tests, and CI scripts.

Assumptions

  • Eraser is deployed by cluster administrators; untrusted tenants do not have RBAC to create/modify ImageList/ImageJob or eraser-manager-config.
  • Cleanup pods run trusted images; if custom images are supplied they are treated as fully trusted.
  • The Kubernetes API server enforces authn/authz; there is no app-level authentication.
  • There is no direct external user-facing web UI; exposure is primarily via the Kubernetes API and internal cluster networking.

Attack surface, mitigations and attacker stories

1) Kubernetes API objects (CRDs, ConfigMaps, Pods)

Surface: The controller-manager watches and acts on cluster-scoped ImageList and ImageJob objects (controllers/imagelist, controllers/imagejob). It creates ConfigMaps and PodTemplates and spawns per-node Pods. The configmap controller (controllers/configmap) reacts to updates to eraser-manager-config.

Mitigations/controls

  • RBAC scopes for manager role are declared in config/rbac and charts/eraser/templates.
  • The imagelist controller only processes a single ImageList named "imagelist" (imagelist_controller.go), reducing accidental creation.
  • ImageList-generated ConfigMaps are marked immutable, preventing post-creation mutation (imagelist_controller.go).
  • Job pods use a dedicated service account (eraser-imagejob-pods) without explicit RBAC bindings.

Attacker stories

  • If a tenant can create ImageList objects, they can supply a list containing * or large image sets, forcing deletion of all non-running images on every node (availability impact).
  • If an attacker can mutate eraser-manager-config, they can point component images to malicious registries, enable extra hostPath volumes for the scanner, or change the runtime socket path, leading to node compromise.
  • A user with configmap creation rights and the exclusion label can add/remove exclusions to prevent cleanup (policy bypass) or craft oversized lists for DoS.

2) Pod template construction and runtime socket access

Surface: imagejob controller builds PodSpecs with a hostPath volume pointing to the runtime socket (copyAndFillTemplateSpec in imagejob_controller.go). The remover/collector/scanner containers access /run/cri/cri.sock (pkg/cri, pkg/collector, pkg/remover).

Mitigations/controls

  • Runtime addresses are validated in api/unversioned/eraserconfig_types.go, and utils.GetConn only permits unix sockets (pkg/utils/utils.go), preventing network CRI connections.
  • Remover containers run with a restrictive SecurityContext (drop capabilities, read-only root, seccomp) defined in pkg/utils/security_context.go.
  • Job pods use RestartPolicyNever and do not request privileged mode by default.

Attacker stories

  • A compromised scanner image or supply-chain attack can use the mounted CRI socket to delete images or create containers on the node, escalating to node-level control.
  • If an attacker can alter the runtime socket path in config, they could cause hostPath mounting of arbitrary files/sockets; this is primarily operator-controlled but is a footgun.
  • Misconfigured node filters may unintentionally schedule cleanup on sensitive nodes, causing service disruption.

3) Inter-container communication (named pipes)

Surface: Collector/scanner/remover pass JSON lists of images via named pipes under /run/eraser.sh/shared-data (pkg/utils/utils.go, pkg/scanners/template). Pipes are created with mode 0644 by default.

Mitigations/controls

  • Pipes exist only within the pod’s shared volume; there is no network exposure.
  • The scanner completion pipe is chmod’d to 0600 (scanner_template.go) to reduce accidental writes.

Attacker stories

  • A malicious scanner container can write arbitrary image IDs to the pipe to influence removal, effectively turning the remover into a “delete all non-running images” tool.
  • Any container crash or pipe corruption can block the pipeline and lead to job failure or repeated retries (availability impact).

4) Image metadata parsing and exclusion lists

Surface: Image names/digests from CRI and exclusion lists from ConfigMaps are parsed and matched in pkg/utils/utils.go and used to decide which images to remove.

Mitigations/controls

  • Parsing uses standard JSON and string matching; no shelling out or command execution.
  • Exclusion lists are label-selected and mounted read-only via ConfigMaps.

Attacker stories

  • Workloads that pull images with extremely large or malformed tags could cause high memory use or noisy logs; impact is generally limited to the cleanup pod.
  • An attacker who controls exclusion ConfigMaps can prevent cleanup of specific images (policy bypass).

5) Network endpoints (metrics, health, profiling)

Surface: The manager exposes health and readiness endpoints on :8081 and metrics on :8889; optional pprof servers bind to localhost in the manager, collector, scanner, and remover (main.go, pkg/*/main).

Mitigations/controls

  • Health/metrics endpoints are typically exposed only inside the cluster; no services are created by default in Helm charts.
  • Optional kube-rbac-proxy patch exists in config/default to protect metrics in some deployment modes.

Attacker stories

  • If metrics or pprof are exposed outside the cluster, an attacker could access internal telemetry, node names, or performance data.
  • Repeated scraping or profiling requests could cause resource exhaustion in the manager pod.

6) Supply chain and customization

Surface: Component images and scanner configuration are fully configurable (api/unversioned/config, docs customization). Scanner volumes can mount additional host paths (imagecollector_controller.go).

Mitigations/controls

  • Configuration is operator-controlled; there is no automatic download of untrusted images in code.
  • Security context for the manager is restrictive (charts and config/manager).

Attacker stories

  • Replacing the scanner/remover image with an untrusted image gives that image access to the CRI socket and shared pipes, enabling node compromise or destructive deletions.
  • Extra hostPath volumes for scanners can unintentionally expose host data if misconfigured.

Out-of-scope considerations

  • Web app classes like CSRF/XSS/SQL injection are largely irrelevant because Eraser has no public HTTP UI and relies on the Kubernetes API for authn/authz.
  • Many attacks require cluster-admin-level RBAC (e.g., creating ImageList, changing eraser-manager-config); in real deployments these are typically restricted to trusted operators.

Criticality calibration (critical, high, medium, low)

Critical

  • Remote code execution or privilege escalation in the manager or cleanup pods triggered by unprivileged Kubernetes users, leading to control over the CRI socket and cluster-wide compromise.
  • Ability for an unprivileged actor to create/modify ImageJob/PodTemplate in a way that mounts arbitrary host paths or launches privileged containers.

High

  • Unauthorized creation or manipulation of ImageList/ImageJob objects or configmaps causing deletion of large swaths of non-running images (cluster-wide DoS).
  • Tampering with eraser-manager-config to point to malicious component images or to enable dangerous volumes, resulting in persistent compromise.
  • Bypassing runtime socket restrictions to reach a remote CRI endpoint (if validation were bypassed).

Medium

  • Information leakage through metrics/pprof endpoints or logs (node names, image lists).
  • Denial of service through extremely large ImageList or exclusion ConfigMaps causing high memory/CPU use.
  • Pipe communication manipulation by a compromised container leading to incorrect deletion decisions within a single job.

Low

  • Minor robustness issues in parsing image lists/exclusion files, noisy logs, or failure modes that only affect a single cleanup run.
  • Non-security functional bugs that cause cleanup to skip nodes or misreport metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment