eraser threat model

Overview

Eraser is a Kubernetes operator that cleans container images from nodes. It runs a controller-manager (main.go, controllers/*) that watches cluster-scoped CRDs (ImageList and ImageJob in api/v1) and manages per-node cleanup pods. It supports two modes: manual cleanup via ImageList (imagelist controller builds a ConfigMap with the supplied list and spawns remover pods), and automated cleanup via scheduled ImageJob runs (imagecollector controller creates a pod template with collector, optional scanner, and remover containers). The collector enumerates non-running images via the container runtime (pkg/collector, pkg/cri) and passes JSON lists over named pipes to the scanner/remover (pkg/utils, pkg/scanners/template). The remover deletes images from the CRI socket (pkg/remover). Configuration is provided via an EraserConfig YAML stored in a ConfigMap and mounted into the manager (api/unversioned, main.go). The manager watches the file with inotify and can restart if component enablement changes. Helm charts and kustomize manifests in config/ and charts/ define RBAC, service accounts, and deployment defaults. The manager exposes health/ready probes and metrics, and optional pprof servers for profiling.

Threat model, Trust boundaries and assumptions

Assets to protect

Integrity and availability of node image caches and the container runtime.
Cluster stability and availability (cleanup jobs can disrupt workloads).
Configuration integrity (EraserConfig, exclusion lists, component images).
Service account tokens and cluster RBAC permissions.
Scan results/metrics (may include node names or image identifiers).

Trust boundaries / attacker-controlled inputs

Kubernetes API objects if the attacker has RBAC: ImageList/ImageJob CRDs (api/v1), ConfigMaps labelled for exclusion lists, and the eraser-manager-config ConfigMap.
Image metadata coming from the CRI (image names, tags, digests) that can be influenced by workloads pulling images.
Network endpoints configured by operators (OTEL exporter, Trivy DB), which can be attacker-controlled if misconfigured.
Content written to the shared named pipes if any container in the cleanup pod is compromised (collector/scanner/remover).
Container images used for collector/scanner/remover if an attacker can supply or tamper with them.

Operator-controlled inputs

EraserConfig YAML (runtime socket path, scheduling, component images, extra volumes, node filters).
Helm values, kustomize overlays, and deployment manifests (RBAC, service accounts, labels, security contexts).
Pull secrets and additional pod labels (env var ERASER_PULL_SECRET_NAMES and config fields).
Namespace selection and network policies.

Developer-controlled inputs

Default configuration (api/*/config), build artifacts, tests, and CI scripts.

Assumptions

Eraser is deployed by cluster administrators; untrusted tenants do not have RBAC to create/modify ImageList/ImageJob or eraser-manager-config.
Cleanup pods run trusted images; if custom images are supplied they are treated as fully trusted.
The Kubernetes API server enforces authn/authz; there is no app-level authentication.
There is no direct external user-facing web UI; exposure is primarily via the Kubernetes API and internal cluster networking.

Attack surface, mitigations and attacker stories

1) Kubernetes API objects (CRDs, ConfigMaps, Pods)

Surface: The controller-manager watches and acts on cluster-scoped ImageList and ImageJob objects (controllers/imagelist, controllers/imagejob). It creates ConfigMaps and PodTemplates and spawns per-node Pods. The configmap controller (controllers/configmap) reacts to updates to eraser-manager-config.

Mitigations/controls

RBAC scopes for manager role are declared in config/rbac and charts/eraser/templates.
The imagelist controller only processes a single ImageList named "imagelist" (imagelist_controller.go), reducing accidental creation.
ImageList-generated ConfigMaps are marked immutable, preventing post-creation mutation (imagelist_controller.go).
Job pods use a dedicated service account (eraser-imagejob-pods) without explicit RBAC bindings.

Attacker stories

If a tenant can create ImageList objects, they can supply a list containing * or large image sets, forcing deletion of all non-running images on every node (availability impact).
If an attacker can mutate eraser-manager-config, they can point component images to malicious registries, enable extra hostPath volumes for the scanner, or change the runtime socket path, leading to node compromise.
A user with configmap creation rights and the exclusion label can add/remove exclusions to prevent cleanup (policy bypass) or craft oversized lists for DoS.

2) Pod template construction and runtime socket access

Surface: imagejob controller builds PodSpecs with a hostPath volume pointing to the runtime socket (copyAndFillTemplateSpec in imagejob_controller.go). The remover/collector/scanner containers access /run/cri/cri.sock (pkg/cri, pkg/collector, pkg/remover).

Mitigations/controls

Runtime addresses are validated in api/unversioned/eraserconfig_types.go, and utils.GetConn only permits unix sockets (pkg/utils/utils.go), preventing network CRI connections.
Remover containers run with a restrictive SecurityContext (drop capabilities, read-only root, seccomp) defined in pkg/utils/security_context.go.
Job pods use RestartPolicyNever and do not request privileged mode by default.

Attacker stories

A compromised scanner image or supply-chain attack can use the mounted CRI socket to delete images or create containers on the node, escalating to node-level control.
If an attacker can alter the runtime socket path in config, they could cause hostPath mounting of arbitrary files/sockets; this is primarily operator-controlled but is a footgun.
Misconfigured node filters may unintentionally schedule cleanup on sensitive nodes, causing service disruption.

3) Inter-container communication (named pipes)

Surface: Collector/scanner/remover pass JSON lists of images via named pipes under /run/eraser.sh/shared-data (pkg/utils/utils.go, pkg/scanners/template). Pipes are created with mode 0644 by default.

Mitigations/controls

Pipes exist only within the pod’s shared volume; there is no network exposure.
The scanner completion pipe is chmod’d to 0600 (scanner_template.go) to reduce accidental writes.

Attacker stories

A malicious scanner container can write arbitrary image IDs to the pipe to influence removal, effectively turning the remover into a “delete all non-running images” tool.
Any container crash or pipe corruption can block the pipeline and lead to job failure or repeated retries (availability impact).

4) Image metadata parsing and exclusion lists

Surface: Image names/digests from CRI and exclusion lists from ConfigMaps are parsed and matched in pkg/utils/utils.go and used to decide which images to remove.

Mitigations/controls

Parsing uses standard JSON and string matching; no shelling out or command execution.
Exclusion lists are label-selected and mounted read-only via ConfigMaps.

Attacker stories

Workloads that pull images with extremely large or malformed tags could cause high memory use or noisy logs; impact is generally limited to the cleanup pod.
An attacker who controls exclusion ConfigMaps can prevent cleanup of specific images (policy bypass).

5) Network endpoints (metrics, health, profiling)

Surface: The manager exposes health and readiness endpoints on :8081 and metrics on :8889; optional pprof servers bind to localhost in the manager, collector, scanner, and remover (main.go, pkg/*/main).

Mitigations/controls

Health/metrics endpoints are typically exposed only inside the cluster; no services are created by default in Helm charts.
Optional kube-rbac-proxy patch exists in config/default to protect metrics in some deployment modes.

Attacker stories

If metrics or pprof are exposed outside the cluster, an attacker could access internal telemetry, node names, or performance data.
Repeated scraping or profiling requests could cause resource exhaustion in the manager pod.

6) Supply chain and customization

Surface: Component images and scanner configuration are fully configurable (api/unversioned/config, docs customization). Scanner volumes can mount additional host paths (imagecollector_controller.go).

Mitigations/controls

Configuration is operator-controlled; there is no automatic download of untrusted images in code.
Security context for the manager is restrictive (charts and config/manager).

Attacker stories

Replacing the scanner/remover image with an untrusted image gives that image access to the CRI socket and shared pipes, enabling node compromise or destructive deletions.
Extra hostPath volumes for scanners can unintentionally expose host data if misconfigured.

Out-of-scope considerations

Web app classes like CSRF/XSS/SQL injection are largely irrelevant because Eraser has no public HTTP UI and relies on the Kubernetes API for authn/authz.
Many attacks require cluster-admin-level RBAC (e.g., creating ImageList, changing eraser-manager-config); in real deployments these are typically restricted to trusted operators.

Criticality calibration (critical, high, medium, low)

Critical

Remote code execution or privilege escalation in the manager or cleanup pods triggered by unprivileged Kubernetes users, leading to control over the CRI socket and cluster-wide compromise.
Ability for an unprivileged actor to create/modify ImageJob/PodTemplate in a way that mounts arbitrary host paths or launches privileged containers.

High

Unauthorized creation or manipulation of ImageList/ImageJob objects or configmaps causing deletion of large swaths of non-running images (cluster-wide DoS).
Tampering with eraser-manager-config to point to malicious component images or to enable dangerous volumes, resulting in persistent compromise.
Bypassing runtime socket restrictions to reach a remote CRI endpoint (if validation were bypassed).

Medium

Information leakage through metrics/pprof endpoints or logs (node names, image lists).
Denial of service through extremely large ImageList or exclusion ConfigMaps causing high memory/CPU use.
Pipe communication manipulation by a compromised container leading to incorrect deletion decisions within a single job.

Low

Minor robustness issues in parsing image lists/exclusion files, noisy logs, or failure modes that only affect a single cleanup run.
Non-security functional bugs that cause cleanup to skip nodes or misreport metrics.

sozercan/eraser-threat-model.md

Select an option

No results found

Select an option

No results found

Overview

Threat model, Trust boundaries and assumptions

Attack surface, mitigations and attacker stories

1) Kubernetes API objects (CRDs, ConfigMaps, Pods)

2) Pod template construction and runtime socket access

3) Inter-container communication (named pipes)

4) Image metadata parsing and exclusion lists

5) Network endpoints (metrics, health, profiling)

6) Supply chain and customization

Criticality calibration (critical, high, medium, low)