gatekeeper threat model

Overview

Gatekeeper is a Kubernetes admission controller and policy enforcement system built on Open Policy Agent (OPA). It runs as a controller manager inside the cluster, exposes validating and mutating webhooks, reconciles CRDs for ConstraintTemplates, Constraints, Mutators, and external data Providers, performs periodic audits, and optionally exports audit results. Runtime entrypoints are in main.go, with core admission logic in pkg/webhook/ and policy engines in the OPA and CEL drivers. A separate CLI (cmd/gator) supports local testing and OCI bundle ingestion.

Gatekeeper is security-critical because it sits in the admission path for all Kubernetes resource creation/updates. It typically runs with broad read access to cluster objects and with permissions to manage webhook configuration, CRDs, and status resources. Misconfiguration or vulnerabilities can cause policy bypass, denial of service, or cluster-wide impact.

Threat model, Trust boundaries and assumptions

Trust boundaries

Kubernetes API server → Gatekeeper webhook: AdmissionReview requests are derived from untrusted user inputs but are delivered by the API server over TLS.
Cluster state/CRDs: ConstraintTemplates, Constraints, Mutators, Config, SyncSets, and Provider CRDs are generally operator-controlled.
External data providers: network calls to external services and their responses cross the cluster boundary.
Export destinations: disk paths and Dapr components are external sinks.
Metrics/health endpoints: cluster-internal HTTP surfaces.
CLI/gator and build tooling: developer-controlled execution environment.

Attacker-controlled inputs

Resource manifests submitted to the Kubernetes API (admission payloads, including arbitrary JSON objects, labels, annotations, etc.).
Any object content referenced by referential constraints or audit.
Potentially external data provider responses if the provider is malicious or compromised.

Operator-controlled inputs

Gatekeeper configuration flags and CRDs (Config, SyncSet, Provider, Mutators, ConstraintTemplates, Constraints).
Webhook configuration and certificate settings (client-ca-name, TLS version, cert rotation settings).
Export destinations and audit settings.

Developer-controlled inputs

CLI gator policy bundles (including OCI image references), test fixtures, CI tooling.

Assumptions

Kubernetes API server authentication/authorization is correctly enforced; only trusted operators can create or modify Gatekeeper CRDs and webhook configuration.
Network policies or cluster topology restrict external access to the webhook, metrics, and health endpoints.
Gatekeeper’s service account and secret volumes are protected; compromise of these is out of scope.
External data providers and export endpoints are configured by trusted administrators.

Attack surface, mitigations and attacker stories

Admission and mutation webhooks

Webhook endpoints /v1/admit and /v1/mutate are registered in pkg/webhook/policy.go and pkg/webhook/mutation.go.
TLS configuration and cert rotation live in main.go and pkg/webhook/common.go; minimum TLS version is configurable, and client-ca-name can enforce client cert validation with CN verification (GetCertNameVerifier).
Concurrency is bounded by max-serving-threads and error handling returns failurePolicy=ignore by default (webhook annotations), prioritizing availability.
Gatekeeper bypasses its own service account (gatekeeper-admin) to avoid self-management loops.

Risks: direct network access to the webhook could enable unauthenticated admission requests or DoS; large or complex objects can stress the policy engine. Mitigations include mTLS, RBAC, network policies, rate limits external to Gatekeeper, and concurrency caps.

Policy engines (Rego and CEL)

ConstraintTemplates and Constraints are reconciled in pkg/controller/constrainttemplate/ and feed the OPA engine via the frameworks/constraint client.
Built-ins can be disabled with --disable-opa-builtin (see main.go), reducing dangerous capabilities.
The Kubernetes CEL driver in pkg/drivers/k8scel/ relies on Kubernetes’ CEL validation and strict cost limits.

Risks: malicious or inefficient policies can cause high CPU/memory usage, or use referential lookups for data-dependent decisions. Because template creation is admin-level, this is typically a trusted action. The enable-referential-rules flag limits use of referential constraints to avoid race conditions.

Mutation system

Mutation logic in pkg/mutation/ and the mutating webhook (pkg/webhook/mutation.go) can change incoming resources.
The system enforces convergence to avoid infinite mutation loops (ErrNotConverging), and respects schema conflicts.
External data placeholders are resolved in pkg/mutation/system_external_data.go with failure policies (fail/ignore/default).

Risks: mutations can unintentionally weaken security or inject malicious values, but mutator creation is admin-only. External data can propagate untrusted content; use failure policies and schema validation.

External data providers

Provider CRDs are managed by pkg/controller/externaldata/; network requests and response validation are in pkg/mutation/system_external_data.go.
Gatekeeper uses mTLS client certificates when external data is enabled, validates idempotent responses, applies timeouts, and supports response caching.

Risks: SSRF or data exfiltration if a provider points to internal endpoints; malicious providers can influence admission/mutation results. Mitigate with RBAC restrictions on Provider CRDs, network egress controls, and strict timeouts.

Audit and export

Audit runs in pkg/audit/manager.go, periodically listing resources and generating constraint violations.
Audit result limits (constraint-violations-limit) and chunking reduce memory pressure.
Export is handled via pkg/export/ with disk (pkg/export/disk/disk.go) and Dapr drivers (pkg/export/dapr/dapr.go).

Risks: audit can leak sensitive metadata via logs/events or exports; misconfigured disk paths can lead to overwriting or denial-of-service on storage. These features are operator-controlled; use least-privilege export locations and restrict access to audit events.

Sync/caches and referential data

Cache/sync systems (pkg/cachemanager/, pkg/watch/, pkg/target/) store Kubernetes objects for evaluation.
Stale cache data can lead to inconsistent enforcement; audit-from-cache and referential rules are optional and should be evaluated for consistency needs.

Webhook configuration and readiness

pkg/controller/webhookconfig/ updates webhook matching and CA bundles; errors can disable policy enforcement.
TLS health checks (pkg/webhook/health_check.go) validate certs locally and use insecure TLS only for internal checks.

Metrics/health/pprof endpoints

Health endpoint defaults to :9090, metrics can be enabled; pprof is optional and bound to localhost.
These endpoints may reveal internal state; use network policies or bind to localhost when possible.

CLI tooling and OCI bundles

cmd/gator/ and pkg/oci/oci.go pull policy bundles from OCI registries or local files. These are developer tools; vulnerabilities here primarily affect local environments and CI pipelines.

Attacker stories

A cluster tenant submits resources crafted to trigger worst-case policy evaluation, causing admission latency spikes or webhook timeouts (DoS). Mitigate via policy review, resource limits, and max-serving-threads.
An attacker with network access to the webhook sends unauthenticated requests, inducing CPU load or bypassing API server controls. Mitigate via client-ca-name, network policies, and service exposure restrictions.
A compromised external data provider returns malicious values that mutate or validate resources incorrectly. Mitigate via RBAC on Provider CRDs, TLS/mTLS, strict timeouts, and failure policies.
An operator misconfigures the Config CRD to exclude namespaces (pkg/controller/config/process/excluder.go), unintentionally allowing policy bypass in sensitive namespaces.
A malicious admin creates a ConstraintTemplate with Rego that performs expensive lookups or leaks data. This is an admin-level action and is generally out of scope for untrusted users.
Audit export configured to a shared disk path leaks violation data to other workloads; mitigate by isolating volumes and limiting export features.

Out-of-scope/low-relevance classes

Classic web vulnerabilities (XSS/CSRF/SQLi) are largely inapplicable because Gatekeeper is a backend controller without a web UI.
Session management and cookies are not used.

Criticality calibration (critical, high, medium, low)

Critical

Remote code execution or arbitrary file write in the Gatekeeper pod leading to cluster compromise or access to the service account.
Authentication bypass allowing untrusted network clients to spoof AdmissionReview traffic and approve or mutate resources.
Vulnerabilities that let unprivileged users read or modify sensitive cluster resources via Gatekeeper’s elevated permissions.

High

Admission policy bypass for attacker-controlled resources (e.g., logic flaw allowing constraints to be skipped).
SSRF or data exfiltration through external data provider requests when attackers can influence Provider config.
Denial-of-service that blocks the API server or causes consistent webhook timeouts across the cluster.

Medium

Information disclosure through audit exports, logs, or metrics that expose object metadata or violation details.
Path manipulation or disk exhaustion via export configuration that requires operator access.
Cache inconsistency or referential rule races leading to occasional false allow/deny decisions.

Low

Issues limited to the gator CLI or developer tooling (e.g., unsafe OCI pull behavior) that require local execution.
Minor log injection or error-handling bugs without security impact.
Misconfiguration risks that require full cluster-admin access and do not extend beyond that privilege level.

sozercan/gk-threat-model.md

Select an option

No results found