Gatekeeper is a Kubernetes admission controller and policy enforcement system built on Open Policy Agent (OPA). It runs as a controller manager inside the cluster, exposes validating and mutating webhooks, reconciles CRDs for ConstraintTemplates, Constraints, Mutators, and external data Providers, performs periodic audits, and optionally exports audit results. Runtime entrypoints are in main.go, with core admission logic in pkg/webhook/ and policy engines in the OPA and CEL drivers. A separate CLI (cmd/gator) supports local testing and OCI bundle ingestion.
Gatekeeper is security-critical because it sits in the admission path for all Kubernetes resource creation/updates. It typically runs with broad read access to cluster objects and with permissions to manage webhook configuration, CRDs, and status resources. Misconfiguration or vulnerabilities can cause policy bypass, denial of service, or cluster-wide impact.
Trust boundaries
- Kubernetes API server → Gatekeeper webhook: AdmissionReview requests are derived from untrusted user inputs but are delivered by the API server over TLS.
- Cluster state/CRDs: ConstraintTemplates, Constraints, Mutators, Config, SyncSets, and Provider CRDs are generally operator-controlled.
- External data providers: network calls to external services and their responses cross the cluster boundary.
- Export destinations: disk paths and Dapr components are external sinks.
- Metrics/health endpoints: cluster-internal HTTP surfaces.
- CLI/gator and build tooling: developer-controlled execution environment.
Attacker-controlled inputs
- Resource manifests submitted to the Kubernetes API (admission payloads, including arbitrary JSON objects, labels, annotations, etc.).
- Any object content referenced by referential constraints or audit.
- Potentially external data provider responses if the provider is malicious or compromised.
Operator-controlled inputs
- Gatekeeper configuration flags and CRDs (Config, SyncSet, Provider, Mutators, ConstraintTemplates, Constraints).
- Webhook configuration and certificate settings (
client-ca-name, TLS version, cert rotation settings). - Export destinations and audit settings.
Developer-controlled inputs
- CLI gator policy bundles (including OCI image references), test fixtures, CI tooling.
Assumptions
- Kubernetes API server authentication/authorization is correctly enforced; only trusted operators can create or modify Gatekeeper CRDs and webhook configuration.
- Network policies or cluster topology restrict external access to the webhook, metrics, and health endpoints.
- Gatekeeper’s service account and secret volumes are protected; compromise of these is out of scope.
- External data providers and export endpoints are configured by trusted administrators.
- Webhook endpoints
/v1/admitand/v1/mutateare registered inpkg/webhook/policy.goandpkg/webhook/mutation.go. - TLS configuration and cert rotation live in
main.goandpkg/webhook/common.go; minimum TLS version is configurable, andclient-ca-namecan enforce client cert validation with CN verification (GetCertNameVerifier). - Concurrency is bounded by
max-serving-threadsand error handling returnsfailurePolicy=ignoreby default (webhook annotations), prioritizing availability. - Gatekeeper bypasses its own service account (
gatekeeper-admin) to avoid self-management loops.
Risks: direct network access to the webhook could enable unauthenticated admission requests or DoS; large or complex objects can stress the policy engine. Mitigations include mTLS, RBAC, network policies, rate limits external to Gatekeeper, and concurrency caps.
- ConstraintTemplates and Constraints are reconciled in
pkg/controller/constrainttemplate/and feed the OPA engine via the frameworks/constraint client. - Built-ins can be disabled with
--disable-opa-builtin(seemain.go), reducing dangerous capabilities. - The Kubernetes CEL driver in
pkg/drivers/k8scel/relies on Kubernetes’ CEL validation and strict cost limits.
Risks: malicious or inefficient policies can cause high CPU/memory usage, or use referential lookups for data-dependent decisions. Because template creation is admin-level, this is typically a trusted action. The enable-referential-rules flag limits use of referential constraints to avoid race conditions.
- Mutation logic in
pkg/mutation/and the mutating webhook (pkg/webhook/mutation.go) can change incoming resources. - The system enforces convergence to avoid infinite mutation loops (
ErrNotConverging), and respects schema conflicts. - External data placeholders are resolved in
pkg/mutation/system_external_data.gowith failure policies (fail/ignore/default).
Risks: mutations can unintentionally weaken security or inject malicious values, but mutator creation is admin-only. External data can propagate untrusted content; use failure policies and schema validation.
- Provider CRDs are managed by
pkg/controller/externaldata/; network requests and response validation are inpkg/mutation/system_external_data.go. - Gatekeeper uses mTLS client certificates when external data is enabled, validates idempotent responses, applies timeouts, and supports response caching.
Risks: SSRF or data exfiltration if a provider points to internal endpoints; malicious providers can influence admission/mutation results. Mitigate with RBAC restrictions on Provider CRDs, network egress controls, and strict timeouts.
- Audit runs in
pkg/audit/manager.go, periodically listing resources and generating constraint violations. - Audit result limits (
constraint-violations-limit) and chunking reduce memory pressure. - Export is handled via
pkg/export/with disk (pkg/export/disk/disk.go) and Dapr drivers (pkg/export/dapr/dapr.go).
Risks: audit can leak sensitive metadata via logs/events or exports; misconfigured disk paths can lead to overwriting or denial-of-service on storage. These features are operator-controlled; use least-privilege export locations and restrict access to audit events.
- Cache/sync systems (
pkg/cachemanager/,pkg/watch/,pkg/target/) store Kubernetes objects for evaluation. - Stale cache data can lead to inconsistent enforcement;
audit-from-cacheand referential rules are optional and should be evaluated for consistency needs.
pkg/controller/webhookconfig/updates webhook matching and CA bundles; errors can disable policy enforcement.- TLS health checks (
pkg/webhook/health_check.go) validate certs locally and use insecure TLS only for internal checks.
- Health endpoint defaults to
:9090, metrics can be enabled; pprof is optional and bound to localhost. - These endpoints may reveal internal state; use network policies or bind to localhost when possible.
cmd/gator/andpkg/oci/oci.gopull policy bundles from OCI registries or local files. These are developer tools; vulnerabilities here primarily affect local environments and CI pipelines.
Attacker stories
- A cluster tenant submits resources crafted to trigger worst-case policy evaluation, causing admission latency spikes or webhook timeouts (DoS). Mitigate via policy review, resource limits, and
max-serving-threads. - An attacker with network access to the webhook sends unauthenticated requests, inducing CPU load or bypassing API server controls. Mitigate via
client-ca-name, network policies, and service exposure restrictions. - A compromised external data provider returns malicious values that mutate or validate resources incorrectly. Mitigate via RBAC on Provider CRDs, TLS/mTLS, strict timeouts, and failure policies.
- An operator misconfigures the Config CRD to exclude namespaces (
pkg/controller/config/process/excluder.go), unintentionally allowing policy bypass in sensitive namespaces. - A malicious admin creates a ConstraintTemplate with Rego that performs expensive lookups or leaks data. This is an admin-level action and is generally out of scope for untrusted users.
- Audit export configured to a shared disk path leaks violation data to other workloads; mitigate by isolating volumes and limiting export features.
Out-of-scope/low-relevance classes
- Classic web vulnerabilities (XSS/CSRF/SQLi) are largely inapplicable because Gatekeeper is a backend controller without a web UI.
- Session management and cookies are not used.
Critical
- Remote code execution or arbitrary file write in the Gatekeeper pod leading to cluster compromise or access to the service account.
- Authentication bypass allowing untrusted network clients to spoof AdmissionReview traffic and approve or mutate resources.
- Vulnerabilities that let unprivileged users read or modify sensitive cluster resources via Gatekeeper’s elevated permissions.
High
- Admission policy bypass for attacker-controlled resources (e.g., logic flaw allowing constraints to be skipped).
- SSRF or data exfiltration through external data provider requests when attackers can influence Provider config.
- Denial-of-service that blocks the API server or causes consistent webhook timeouts across the cluster.
Medium
- Information disclosure through audit exports, logs, or metrics that expose object metadata or violation details.
- Path manipulation or disk exhaustion via export configuration that requires operator access.
- Cache inconsistency or referential rule races leading to occasional false allow/deny decisions.
Low
- Issues limited to the
gatorCLI or developer tooling (e.g., unsafe OCI pull behavior) that require local execution. - Minor log injection or error-handling bugs without security impact.
- Misconfiguration risks that require full cluster-admin access and do not extend beyond that privilege level.