Skip to content

Instantly share code, notes, and snippets.

@dims
Last active May 5, 2026 18:08
Show Gist options
  • Select an option

  • Save dims/2a19f6333c925f10c7998d631e7d6d4d to your computer and use it in GitHub Desktop.

Select an option

Save dims/2a19f6333c925f10c7998d631e7d6d4d to your computer and use it in GitHub Desktop.
Kubernetes Security Findings — May 2026

Kubernetes Security Findings — May 2026

Repository: kubernetes/kubernetes
Commit: 47f990437458a2b171f51b5e97a0c28c81d949d1 (master, 2026-05-05)
Methods: Static multi-agent source review (87 files across 4 researchers) + dynamic execution harness (kubectl, 3 agents)
Subsystems: authentication, authorization/RBAC, admission control/webhooks, node authorization (NodeAuthorizer + DRA graph)


Table of Contents


Executive Summary

13 security-relevant findings were identified across four subsystems:

Severity Count Highest-impact subsystem
MEDIUM 3 Admission (quota bypass), Node auth (DRA graph), Webhook predicates
LOW 10 Authn, Authz/CEL, Admission (dispatcher, validator, reinvocation, namespace), Node/RBAC

No CRITICAL or HIGH findings confirmed. The three MEDIUM findings are:

  1. Concurrent ResourceQuota bypass (controller.go:228) — an acknowledged comment in the code confirms the retry logic can evaluate quota against a state that "never actually exists". Authenticated attackers with CREATE rights can transiently burst through quota limits.
  2. DRA node graph misses ExtendedResourceClaim edges (graph_populator.go:112) — PodStatusEqual only compares ResourceClaimStatuses, so when ExtendedResourceClaimStatus changes independently, the fast-path skips AddPod and the extendedClaim → pod → node graph edge is never added. AddPod explicitly handles this field at graph.go:431 — the fast-path guard doesn't know about it.
  3. Webhook wildcard resources: ["*"] silently excludes subresources (rules.go:106) — operator-level misconfiguration that renders admission security controls ineffective against pods/exec, pods/log, and similar subresource operations. The wildcard parsing logic confirms this: sub == "*" only matches when the rule entry itself contains a / separator.

The dynamic kubectl harness is still running. No crashes have landed. Quantity parsing is correctly rejecting extreme exponents with error exits (not panics). kubectl cp path traversal exploration was in progress at report time.


Methodology

Static Analysis

Four parallel researchers analyzed ~87 files across security-critical subsystems:

Researcher Subsystem Files Focus
R1 Authentication 26 plugin/pkg/auth/authenticator/, staging/.../authentication/
R2 Authorization + RBAC 16 plugin/pkg/auth/authorizer/, pkg/registry/rbac/
R3 Admission control + webhooks 28 plugin/pkg/admission/, staging/.../admission/plugin/
R4 K8s cross-cutting specialist 17 NodeAuthorizer, graph populator, CEL authz, webhook predicates

Each researcher read source files directly and filed structured findings. A deduplication pass removed two overlapping observations. Findings marked with [code verified] have been confirmed against the actual source at the commit listed above.

Dynamic Harness

A kubectl binary was cross-compiled (Linux/arm64, CGO_ENABLED=0) from commit 47f9904 and packaged into a Docker image (vuln-harness-kubectl-k8s:latest) with relevant source. Three find-agents ran in parallel:

Agent Focus Preliminary signal
run_000 kubeconfig parsing, client cert handling Still exploring
run_001 Resource quantity parsing (integer overflow) 1e9223372036854775807 → clean "unable to parse" error, exit 1
run_002 kubectl cp tar extraction (path traversal) Reading cp.go source for attack vector analysis

Findings — MEDIUM


MEDIUM-1 — Concurrent ResourceQuota Bypass [code verified]

File staging/src/k8s.io/apiserver/pkg/admission/plugin/resourcequota/controller.go
Lines 50–341 (quotaEvaluator struct + checkQuotas); bug acknowledged at line 315
Confidence 0.65
CVSS estimate AV:N/AC:H/PR:L/UI:N/S:U/C:N/I:L/A:N ~3.7
Exploit difficulty Medium — requires controlled request timing; most effective in multi-master clusters

The bug — in the code's own words:

// controller.go:315–317
// this retry logic has the same bug that its possible to be checking against quota
// in a state that never actually exists where you've added a new documented, then
// updated an old one, your resource matches both and you're only checking one

This comment has existed for years. The Kubernetes team is aware of it.

Architecture: how quota evaluation is batched

Every admission request that touches a quota-tracked resource goes through quotaEvaluator.Evaluate(), which adds an admissionWaiter to a namespace-keyed work queue and blocks:

// controller.go:650–685
func (e *quotaEvaluator) Evaluate(a admission.Attributes) error {
    e.init.Do(e.start)
    // ...
    waiter := newAdmissionWaiter(a)
    e.addWork(waiter)
    select {
    case <-waiter.finished:          // unblocks when checkQuotas closes this channel
    case <-time.After(10 * time.Second):
        return apierrors.NewInternalError(...)
    }
    return waiter.result
}

addWork places the waiter into one of two maps keyed by namespace:

// controller.go:688–702
func (e *quotaEvaluator) addWork(a *admissionWaiter) {
    e.workLock.Lock()
    defer e.workLock.Unlock()
    ns := a.attributes.GetNamespace()
    e.queue.Add(ns)
    if e.inProgress.Has(ns) {
        e.dirtyWork[ns] = append(e.dirtyWork[ns], a)  // arrives while batch is running
        return
    }
    e.work[ns] = append(e.work[ns], a)                // normal path
}

A pool of goroutines (doWork) continuously drains work[ns] batches. When a batch for namespace X is dequeued with getWork(), the namespace is added to inProgress and all subsequent requests for X land in dirtyWork[ns] until completeWork swaps dirty→work. Crucially, requests that arrive during an in-progress batch form the NEXT batch; they do not get a second bite at the current evaluation.

Within a single batch: correct behavior

checkAttributes reads all quota objects for the namespace once at the start, then calls checkQuotas with remainingRetries=3:

// controller.go:184–213
func (e *quotaEvaluator) checkAttributes(ns string, admissionAttributes []*admissionWaiter) {
    // ...
    quotas, err := e.quotaAccessor.GetQuotas(ns)  // one read for the whole batch
    // ...
    e.checkQuotas(quotas, admissionAttributes, 3)
}

Inside checkQuotas, requests are evaluated serially against a running quotas slice. Each admitted request updates quotas = newQuotas so the next one sees the incremented usage:

// controller.go:236–266
for i := range admissionAttributes {
    newQuotas, err := e.checkRequest(quotas, admissionAttribute.attributes)
    // ...
    quotas = newQuotas   // carries forward: request N+1 sees N's usage
}

Within one batch this is correct: if quota is 8/10 and 5 requests arrive, the first two are admitted (8→9→10), the next three are denied (10/10 full). No over-admission.

The retry path: where it breaks

After the sequential check, each quota that changed is written to etcd via UpdateQuotaStatus:

// controller.go:288
if err := e.quotaAccessor.UpdateQuotaStatus(&newQuota); err != nil {
    updatedFailedQuotas = append(updatedFailedQuotas, newQuota)
    lastErr = err
}

UpdateQuotaStatus issues a CoreV1().ResourceQuotas().UpdateStatus() call. etcd enforces optimistic concurrency using the resourceVersion field: if another API server replica wrote to the same quota object between the initial GetQuotas and this UpdateStatus, the call returns a 409 Conflict. This is the normal mechanism that prevents lost updates.

On conflict, checkQuotas re-fetches the quota and recurses:

// controller.go:319–341
newQuotas, err := e.quotaAccessor.GetQuotas(quotas[0].Namespace)
// ...
quotasToCheck := []corev1.ResourceQuota{}
for _, newQuota := range newQuotas {
    for _, oldQuota := range updatedFailedQuotas {
        if newQuota.Name == oldQuota.Name {
            quotasToCheck = append(quotasToCheck, newQuota)
        }
    }
}
e.checkQuotas(quotasToCheck, admissionAttributes, remainingRetries-1)

The problem: GetQuotas reads from the informer watch cache (the lister at line 113), not directly from etcd. The watch cache has an inherent propagation lag — it reflects etcd state as of the last watch event, which may be hundreds of milliseconds stale. In a multi-master cluster, three API server replicas each maintaining their own watch caches:

API server A: reads quota 8/10 (informer), admits 2, tries UpdateStatus → CONFLICT (B wrote first)
API server B: reads quota 8/10 (informer), admits 2, UpdateStatus succeeds → etcd now 10/10
API server C: reads quota 8/10 (informer), admits 2, tries UpdateStatus → CONFLICT (B wrote first)

A retries: GetQuotas → informer still shows 8/10 (watch not propagated yet)
           → checkQuotas re-runs both admitted requests → admits both again
           → UpdateStatus → 12/10 if it wins, or another conflict and another retry
C retries: same scenario

After 3 retries (remainingRetries starts at 3), remainingRetries <= 0 causes all still-pending defaultDeny results to receive lastErr — but requests already cleared from defaultDeny (i.e., those admitted in a prior retry round) are NOT reverted. The admissionWaiter.result for admitted requests stays nil.

Exploit scenario (multi-master)

Setup: 3 API server replicas, namespace quota pods=10, currently 8 used.
Attacker (with create pods permission) fires a burst of 6 concurrent requests,
2 landing on each API server.

Each replica's quotaEvaluator:
  A: batch [req1, req2] → checkRequest(8/10)→admits req1 (9/10), admits req2 (10/10)
  B: batch [req3, req4] → checkRequest(8/10)→admits req3 (9/10), admits req4 (10/10)
  C: batch [req5, req6] → checkRequest(8/10)→admits req5 (9/10), admits req6 (10/10)

UpdateStatus race: one replica wins (e.g. B → etcd: 10/10), A and C get 409.

A retry: GetQuotas → watch cache still 8/10 → re-admits req1, req2 again.
C retry: GetQuotas → watch cache still 8/10 → re-admits req5, req6 again.

Result: up to 14 pods in a namespace with quota=10.
(In practice, some retries will also conflict, limiting over-admission.)

Practical impact

  • Over-admitted pods are not cleaned up — garbage collection does not remove pods that exceeded quota after the fact. Only explicit deletion removes them.
  • The over-admission is bounded by remainingRetries=3 and the number of concurrent API server replicas. Realistic over-admission: quota_limit × number_of_api_servers.
  • Only affects resources tracked by ResourceQuota (pods, services, pvcs, configmaps, etc.)
  • The attacker needs only create permission on a quota-tracked resource — a low bar in shared-cluster environments.

Recommended fix: Change UpdateQuotaStatus to use a conditional update that verifies the resourceVersion returned from the initial GetQuotas read. If the version has advanced (another replica wrote), treat it as a definitive conflict rather than a retriable error, and deny the entire batch rather than re-evaluating with a stale baseline.

Prior art

No public tracking issue or PR found as of 2026-05-05. The acknowledged comment at controller.go:315 has existed for years without a filed issue.


MEDIUM-2 — DRA Node Graph Misses ExtendedResourceClaim Edges [code verified]

File plugin/pkg/auth/authorizer/node/graph_populator.go:102–125
Graph edge added by plugin/pkg/auth/authorizer/node/graph.go:431–434
Feature gate DRAExtendedResource — Alpha in 1.34, Beta (default: true) in 1.36
Confidence 0.90 — three-file cross-reference confirms the gap
CVSS estimate AV:N/AC:L/PR:H/UI:N/S:U/C:N/I:N/A:H ~4.9
Exploit difficulty N/A — this is a false-negative (too-restrictive authorization), not privilege escalation

Background: what DRAExtendedResource is

Before DRA, GPUs were exposed to Kubernetes via the device plugin API: a per-node kubelet mechanism where the node agent tracked inventory and allocations locally. The DRAExtendedResource feature gate bridges the two worlds. When enabled, a pod requesting an extended resource like resources.limits: nvidia.com/gpu: "1" can have that request fulfilled via DRA rather than the legacy device plugin path — without changing the pod spec.

When the scheduler places such a pod, it synthesizes a ResourceClaim object whose name it writes into pod.Status.ExtendedResourceClaimStatus.ResourceClaimName. This claim is NOT listed in pod.Spec.ResourceClaims; it is an out-of-band scheduler artifact. Kubelet must read this claim to call NodePrepareResources on the DRA driver, which is the call that actually makes the device available to the container.

The PodStatus struct has two distinct DRA-related fields:

// staging/src/k8s.io/api/core/v1/types.go:5452–5456
ResourceClaimStatuses        []PodResourceClaimStatus         // standard DRA: from spec.ResourceClaims
ExtendedResourceClaimStatus  *PodExtendedResourceClaimStatus  // extended resource DRA: scheduler-synthesized
                                                               // +featureGate=DRAExtendedResource

The dependency chain

kubelet wants to start GPU pod
  → reads pod.Status.ExtendedResourceClaimStatus.ResourceClaimName ("pod-abc-gpu-claim")
  → calls NodePrepareResources on DRA driver with that claim
    → must first GET resourceclaims/pod-abc-gpu-claim from API server
      → NodeAuthorizer checks hasPathFrom("node-A", resourceClaimVertexType, ns, "pod-abc-gpu-claim")
        → checks the in-memory graph for edge: extendedClaim → pod → node
          → edge was never added because AddPod() was skipped by the fast-path
            → hasPathFrom returns false → 403 Forbidden
              → NodePrepareResources never called → pod stuck in ContainerCreating

Confirmed in pkg/kubelet/cm/dra/manager.go:255–263:

if utilfeature.DefaultFeatureGate.Enabled(kubefeatures.DRAExtendedResource) {
    if pod.Status.ExtendedResourceClaimStatus != nil {
        extendedResourceClaim := v1.PodResourceClaim{
            ResourceClaimName: &pod.Status.ExtendedResourceClaimStatus.ResourceClaimName,
        }
        podResourceClaims = append(podResourceClaims, extendedResourceClaim)
    }
}
// podResourceClaims is then iterated to call NodePrepareResources for each claim

The gap: AddPod knows about the field; the fast-path guard does not

AddPod in graph.go explicitly handles ExtendedResourceClaimStatus at line 431:

// graph.go:431–434
if pod.Status.ExtendedResourceClaimStatus != nil &&
    len(pod.Status.ExtendedResourceClaimStatus.ResourceClaimName) > 0 {
    claimVertex := g.getOrCreateVertexLocked(resourceClaimVertexType,
        pod.Namespace, pod.Status.ExtendedResourceClaimStatus.ResourceClaimName)
    g.addEdgeLocked(claimVertex, podVertex, nodeVertex)   // edge: extendedClaim → pod → node
}

But graph_populator.updatePod has a fast-path that skips calling AddPod when certain pod fields are unchanged. The guard uses resourceclaim.PodStatusEqual, which compares only ResourceClaimStatuses (the standard DRA field):

// graph_populator.go:109–118
if oldPod, ok := oldObj.(*corev1.Pod); ok && oldPod != nil {
    hasNewEphemeralContainers := len(pod.Spec.EphemeralContainers) > len(oldPod.Spec.EphemeralContainers)
    if (pod.Spec.NodeName == oldPod.Spec.NodeName) && (pod.UID == oldPod.UID) &&
        !hasNewEphemeralContainers &&
        resourceclaim.PodStatusEqual(                    // ← compares ResourceClaimStatuses only
            oldPod.Status.ResourceClaimStatuses,
            pod.Status.ResourceClaimStatuses) {
        return                                           // ← AddPod() never called
    }
}

PodStatusEqual in resourceclaim/pod.go:34–50 compares the Name and ResourceClaimName fields of []PodResourceClaimStatus. It has no knowledge of the separate *PodExtendedResourceClaimStatus pointer.

Event timeline for a GPU-only pod

A pod with limits: nvidia.com/gpu: "1" and no spec.ResourceClaims (the dominant real-world pattern for GPU workloads today):

T1  Scheduler binds pod to node-A (Spec.NodeName = "node-A")
    informer fires: updatePod(nil, pod)
    → oldPod is nil → fast-path not evaluated → AddPod() called ✓
    → but ExtendedResourceClaimStatus is still nil at T1 → no edge added (nothing to add yet)

T2  Scheduler creates synthetic ResourceClaim "pod-abc-gpu-claim" and writes status:
    pod.Status.ExtendedResourceClaimStatus = {ResourceClaimName: "pod-abc-gpu-claim"}
    informer fires: updatePod(oldPod_T1, newPod_T2)
    → old.Spec.NodeName == new.Spec.NodeName  ✓
    → old.UID == new.UID  ✓
    → !hasNewEphemeralContainers  ✓
    → PodStatusEqual(old.ResourceClaimStatuses, new.ResourceClaimStatuses)
      = PodStatusEqual(nil, nil)  →  true   ← both nil; pod has no spec.ResourceClaims
    → FAST-PATH FIRES → AddPod() is NOT called ✗
    → edge "pod-abc-gpu-claim" → pod → node-A NEVER added to graph

T3  kubelet calls GET resourceclaims/pod-abc-gpu-claim
    NodeAuthorizer: hasPathFrom("node-A", resourceClaimVertexType, ns, "pod-abc-gpu-claim")
    → startingVertex found in graph, but no edge to node-A vertex
    → returns false → DecisionNoOpinion → 403 Forbidden
    → kubelet DRA manager cannot call NodePrepareResources → pod stuck forever

Why this wasn't caught

The scheduler's own pod event handler — pkg/scheduler/framework/events.go:161–167 — already correctly uses both equality functions to detect changes:

func extractPodGeneratedResourceClaimChange(newPod *v1.Pod, oldPod *v1.Pod) fwk.ActionType {
    if !resourceclaim.PodStatusEqual(newPod.Status.ResourceClaimStatuses, oldPod.Status.ResourceClaimStatuses) ||
        !resourceclaim.PodExtendedStatusEqual(newPod.Status.ExtendedResourceClaimStatus, oldPod.Status.ExtendedResourceClaimStatus) {
        return fwk.UpdatePodGeneratedResourceClaim
    }
    return fwk.None
}

The graph populator was not updated to match when ExtendedResourceClaimStatus was added. It is a consistency gap between two independent pod-status watchers in the same binary.

Who is affected

Any cluster running Kubernetes 1.36+ (where DRAExtendedResource=true by default) with:

  • GPU or other extended resource workloads being migrated to the DRA path (NVIDIA GPU Operator DRA driver, Intel GPU DRA driver, etc.)
  • Pods that use resources.limits: vendor.com/device: "1" without explicit spec.ResourceClaims

Clusters on 1.34 or 1.35 with DRAExtendedResource=true set explicitly are also affected.

Recommended fix

One line, in graph_populator.go:

// Change the fast-path guard from:
resourceclaim.PodStatusEqual(
    oldPod.Status.ResourceClaimStatuses,
    pod.Status.ResourceClaimStatuses)

// To:
resourceclaim.PodStatusEqual(
    oldPod.Status.ResourceClaimStatuses,
    pod.Status.ResourceClaimStatuses) &&
resourceclaim.PodExtendedStatusEqual(
    oldPod.Status.ExtendedResourceClaimStatus,
    pod.Status.ExtendedResourceClaimStatus)

PodExtendedStatusEqual is already defined in staging/src/k8s.io/dynamic-resource-allocation/resourceclaim/pod.go:52. The scheduler already calls it for exactly this purpose.

Prior art

No public tracking issue or PR found as of 2026-05-05. Fix is available on branch fix/graph-populator-extended-resource-claim.


MEDIUM-3 — Webhook Wildcard resources: ["*"] Silently Excludes Subresources [code verified]

File staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/predicates/rules/rules.go:106–116
Test coverage rules_test.go:255–268 explicitly tests and expects this behavior (see below)
Confidence 0.90 — intended design, dangerous in practice, no API-server-level warning
CVSS estimate AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H ~8.5 if the webhook is a security control
Exploit difficulty Zero once the misconfiguration exists; requires operator error to set up

The matching logic

Matcher.Matches() calls five sub-matchers: scope, operation, group, version, resource. The resource() function is where subresources diverge from the intuitive wildcard behavior:

// rules.go:98–116
func splitResource(resSub string) (res, sub string) {
    parts := strings.SplitN(resSub, "/", 2)
    if len(parts) == 2 {
        return parts[0], parts[1]   // "pods/exec" → ("pods", "exec")
    }
    return parts[0], ""             // "*"         → ("*",    "")
}

func (r *Matcher) resource() bool {
    opRes, opSub := r.Attr.GetResource().Resource, r.Attr.GetSubresource()
    for _, res := range r.Rule.Resources {
        res, sub := splitResource(res)
        resMatch := res == "*" || res == opRes   // wildcard "*" matches any resource name
        subMatch := sub == "*" || sub == opSub   // BUT: sub="" only matches opSub=""
        if resMatch && subMatch {
            return true
        }
    }
    return false
}

For a rule entry of "*": sub = "", so subMatch = ("" == "*") || ("" == opSub).

For a pods/exec request, opSub = "exec", so subMatch = false || false = false. No match.

This is intentional and tested

The unit test at rules_test.go:255–268 documents this exact behavior as the expected contract:

"no subresources": {
    rule: adreg.RuleWithOperations{
        Rule: adreg.Rule{Resources: []string{"*"}},
    },
    match: attrList(
        a("g", "v", "r", "",     "name", admission.Create, ...),  // no subresource → MATCHES
        a("2", "v", "r2", "",    "name", admission.Create, ...),  // no subresource → MATCHES
    ),
    noMatch: attrList(
        a("g", "v", "r",  "exec",  "name", admission.Create, ...),  // subresource → NO MATCH
        a("2", "v", "r2", "proxy", "name", admission.Create, ...),  // subresource → NO MATCH
    ),
},

The behavior is not a bug in the matching code — it is specified, tested, and documented in the API reference. The problem is that operators routinely misread "*" as "everything" when it actually means "every resource with no subresource".

The full resource pattern taxonomy

Understanding what each pattern actually matches:

Rule resources entry res sub Matches (resource, subresource) Does NOT match
"*" "*" "" (pods, ""), (services, "") (pods, exec), (pods, log)
"*/*" "*" "*" (pods, exec), (pods, log) (pods, "") — no subresource
"pods" "pods" "" (pods, "") (pods, exec)
"pods/*" "pods" "*" (pods, exec), (pods, log) (pods, ""), (services, exec)
"pods/exec" "pods" "exec" (pods, exec) (pods, log), (pods, "")
"*/exec" "*" "exec" (pods, exec), (services, exec) (pods, log), (pods, "")

To cover all operations on all resources including all subresources, you need both:

resources: ["*", "*/*"]

Because neither "*" nor "*/*" alone covers both the resource-level and subresource-level operations simultaneously.

Subresource operations that bypass a resources: ["*"] webhook

Any operation where GetSubresource() returns a non-empty string is excluded:

Operation Subresource string Who can call it
kubectl exec exec Any user with pods/exec RBAC
kubectl logs log Any user with pods/log RBAC
kubectl attach attach Any user with pods/attach RBAC
kubectl port-forward portforward Any user with pods/portforward RBAC
kubectl cp exec (streams via exec) Any user with pods/exec RBAC
Deployment scale scale Any user with deployments/scale RBAC
Pod ephemeral containers ephemeralcontainers Any user with pods/ephemeralcontainers
Node proxy proxy Any user with nodes/proxy RBAC
Pod status update status Controllers, operators with pods/status
Token requests token Any user with serviceaccounts/token

Why this is a live threat in real clusters

OPA/Gatekeeper, Kyverno, and Kubewarden all have community policy libraries with entries like:

# common in community policies — DOES NOT intercept exec/log/attach
rules:
- apiGroups: ["*"]
  apiVersions: ["*"]
  resources: ["*"]
  operations: ["CREATE", "UPDATE", "DELETE"]

An attacker with pods/exec permission (but blocked from creating new privileged pods by a Kyverno policy using this pattern) can exec into an existing pod and escape without the policy webhook ever firing. The policy author believed they covered all pod operations; exec bypasses it entirely.

Concrete scenario:

  1. Cluster has a Kyverno policy blocking privileged pod creation (resources: ["pods"]).
  2. Attacker cannot create a new privileged pod — webhook denies it.
  3. Attacker has pods/exec on an existing non-privileged pod in the same namespace.
  4. They exec into the pod and access secrets mounted there (service account token, env vars).
  5. With those credentials they escalate further.

The webhook was never invoked for step 3 or 4.

Detection

# Find webhooks with resources: ["*"] that are missing "*/*"
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations -o json | \
  jq -r '
    .items[] |
    .metadata.name as $wh_name |
    .webhooks[]? |
    . as $hook |
    .rules[]? |
    select(.resources | index("*")) |
    select(.resources | index("*/*") | not) |
    "\($wh_name)/\($hook.name): resources=[\"*\"] without [\"*/*\"] — subresource ops bypass this rule"
  '

Recommended fixes

Operator (immediate): Replace every resources: ["*"] with resources: ["*", "*/*"] in webhook configurations where subresource interception is intended.

Per-subresource (explicit): For webhook rules that only need to cover specific high-risk subresources rather than all of them:

resources: ["pods", "pods/exec", "pods/attach", "pods/ephemeralcontainers"]

API server (long-term): Emit a Warning header during ValidatingWebhookConfiguration or MutatingWebhookConfiguration creation/update when any rule has resources containing "*" without "*/*". This is a pure UX addition with no behavior change.

Prior art

kubernetes/kubernetes#115523 (CLOSED, kind/support, triage/accepted) — filed 2023-04, describes exactly this behavior. Jordan Liggitt confirmed it is intentional and matches the documented API: "*" matches all resources but not subresources; "*/*" is required for both. Closed as a documentation/support request, no behavior change planned.


Findings — LOW


LOW-1 — Request Header Authenticator: Split-Atomic TOCTOU [code verified]

File staging/src/k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader.go:121–141
Config storage requestheader_controller.go:82 (exportedRequestHeaderBundle atomic.Value)
Confidence 0.35 — very narrow window, no realistic exploit path

The controller uses atomic.Value to store the config bundle (loaded from the extension-apiserver-authentication ConfigMap). Each StringSliceProvider.Value() call is an atomic load — individually safe.

The subtle race is between two separate Value() calls in AuthenticateRequest:

// requestheader.go:121–141
func (a *requestHeaderAuthRequestHandler) AuthenticateRequest(req *http.Request) (...) {
    name := headerValue(req.Header, a.nameHeaders.Value())       // atomic load #1: old config
    uid  := headerValue(req.Header, a.uidHeaders.Value())
    groups := allHeaderValues(req.Header, a.groupHeaders.Value())
    extra  := newExtra(req.Header, a.extraHeaderPrefixes.Value())

    // ← ConfigMap update fires here, atomic.Store replaces the bundle

    ClearAuthenticationHeaders(req.Header,
        a.nameHeaders, a.uidHeaders, a.groupHeaders, a.extraHeaderPrefixes)
    // ClearAuthenticationHeaders calls Value() again (atomic load #2: new config)
    // → clears headers named in the NEW config, not the ones read by load #1
    // → old header names remain uncleaned in req.Header
}

If the ConfigMap changes header names (e.g., X-Remote-UserX-Custom-User) between loads #1 and #2:

  • Authentication succeeds using the X-Remote-User header (old config)
  • ClearAuthenticationHeaders deletes X-Custom-User (new config)
  • X-Remote-User is left in req.Header — it passes downstream uncleaned

The downstream aggregated API server, if it also trust-proxies the X-Remote-User header, would see the user identity a second time. In practice, configmap updates are rare and the window is extremely short.

Recommended fix: Snapshot the provider value once and pass the snapshot to both AuthenticateRequest body and ClearAuthenticationHeaders:

nameHeaders := a.nameHeaders.Value()
// use nameHeaders (not a.nameHeaders) throughout

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-2 — Token Cache: unsafe.String Aliases Pool Hash Buffer [code verified]

File staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go:230–292
Confidence 0.30 — safe with all current Go stdlib hash implementations

The code:

// cached_token_authenticator.go:232–252
func keyFunc(hashPool *sync.Pool, auds []string, token string) string {
    h := hashPool.Get().(hash.Hash)
    h.Reset()
    // ...writes to h...
    key := toString(h.Sum(nil))  // ← unsafe alias
    hashPool.Put(h)              // ← pool returns h; Reset() may reuse internal buffer
    return key
}

// toString creates a string header pointing to the same memory as b
// without copying it:
func toString(b []byte) string {
    if len(b) == 0 { return "" }
    return unsafe.String(unsafe.SliceData(b), len(b))
}

h.Sum(nil) on Go's standard SHA256 implementation allocates a new slice for the result. The string returned by toString therefore aliases that newly-allocated slice — safe.

The fragile invariant: If a future hash implementation (or a hash registered via a crypto.RegisterHash plugin) returns a borrowed slice from Sum(nil) (an implementation detail not prohibited by the hash.Hash interface), then after hashPool.Put(h) and a subsequent h.Reset() by the pool, the memory backing key could be zeroed or overwritten. The cache key would then silently become "" or a corrupted string, causing a cache miss or a wrong cache hit.

Recommended fix:

key := string(h.Sum(nil))   // standard allocation; the unsafe optimization is not justified here

The cache key is computed on every request miss and stored once in the LRU cache. The extra allocation is negligible.

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-3 — RBAC AllowedSubjects: Deduplication Logic Silently Discarded [code verified]

File plugin/pkg/auth/authorizer/rbac/subject_locator.go:109–124
Confidence 0.25 — no current production caller relies on dedup

The code:

// subject_locator.go:109–124
dedupedSubjects := []rbacv1.Subject{}
for _, subject := range subjects {
    found := false
    for _, curr := range dedupedSubjects {
        if curr == subject {
            found = true
            break
        }
    }
    if !found {
        dedupedSubjects = append(dedupedSubjects, subject)
    }
}
return subjects, utilerrors.NewAggregate(errorlist)
//      ↑ returns the original undeduped slice; dedupedSubjects is thrown away

dedupedSubjects is built correctly but subjects (the original undeduped list) is returned. The O(n²) dedup loop is dead code. This is likely a bug introduced when the return was changed without updating to reference dedupedSubjects.

Current impact: No production caller depends on deduplication; they dedup themselves. The dead code introduces maintenance confusion and a false sense of correctness.

Recommended fix: Return dedupedSubjects instead of subjects.

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-4 — CEL Authorization: FieldSelector Requirements Overwritten by RawSelector [code verified]

File staging/src/k8s.io/apiserver/pkg/authorization/cel/compile.go:289–303
Confidence 0.40 — both fields being set simultaneously is unusual

The code (repeated pattern for both FieldSelector and LabelSelector):

// compile.go:289–303
if len(obj.ResourceAttributes.FieldSelector.Requirements) > 0 {
    // builds requirements map and assigns:
    resourceAttributes[fieldSelectorVarName] = map[string]interface{}{"requirements": requirements}
}
if len(obj.ResourceAttributes.FieldSelector.RawSelector) > 0 {
    // overwrites the key just set above:
    resourceAttributes[fieldSelectorVarName] = map[string]interface{}{"rawSelector": obj.ResourceAttributes.FieldSelector.RawSelector}
}

If a SubjectAccessReview carries both Requirements and RawSelector (allowed by the API), the second if block overwrites the fieldSelector map entry written by the first block. CEL authorization expressions that inspect fieldSelector.requirements will evaluate against an empty/nil value. The identical pattern appears for LabelSelector at lines 306–319.

Practical risk: Authorization webhook auditors or RBAC CEL policies that check field selector requirements would silently receive empty requirements when RawSelector is also set. Policy decisions would be based on incomplete selector information.

Recommended fix:

fs := map[string]interface{}{}
if len(obj.ResourceAttributes.FieldSelector.Requirements) > 0 {
    fs["requirements"] = requirements
}
if len(obj.ResourceAttributes.FieldSelector.RawSelector) > 0 {
    fs["rawSelector"] = obj.ResourceAttributes.FieldSelector.RawSelector
}
if len(fs) > 0 {
    resourceAttributes[fieldSelectorVarName] = fs
}

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-5 — Webhook Dispatcher: Concurrent Write to versionedAttrs Map [code verified]

File staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/dispatcher.go:69–94, 126–133
Confidence 0.50 — race exists but collision requires two webhooks for same GroupVersionKind

The map and the goroutines:

// dispatcher.go:69–84 (versionedAttributeAccessor)
type versionedAttributeAccessor struct {
    versionedAttrs map[schema.GroupVersionKind]*admission.VersionedAttributes
    // no mutex
}

func (v *versionedAttributeAccessor) VersionedAttribute(gvk schema.GroupVersionKind) (...) {
    if val, ok := v.versionedAttrs[gvk]; ok { return val, nil }  // concurrent read
    // ...
    v.versionedAttrs[gvk] = versionedAttr  // concurrent write
    return versionedAttr, nil
}
// dispatcher.go:126–133
wg := sync.WaitGroup{}
wg.Add(len(relevantHooks))
for i := range relevantHooks {
    go func(invocation *generic.WebhookInvocation, idx int) {
        // ...
        versionedAttr := versionedAttrAccessor.versionedAttrs[invocation.Kind]  // unprotected read

When two webhook goroutines both need the same invocation.Kind, one reads while the other may be writing. The Go race detector (-race, which our kubectl binary was built with) would flag this. In production, the race could manifest as a panic from a concurrent map read/write.

Mitigating factors: In practice, most webhook configurations use distinct resource versions per webhook. The race requires two webhooks with matching Kind in the same dispatch batch.

Recommended fix: Add a sync.Mutex to versionedAttributeAccessor, or pre-populate the map in the serial phase before launching goroutines.

Prior art

kubernetes/kubernetes#120507 (CLOSED) and kubernetes/kubernetes#122940 (CLOSED) both describe apiserver panics (fatal error: concurrent map iteration and map write) from this exact race during webhook failures. Fixed by PR #129472 (MERGED, milestone v1.34). This finding is already fixed in master.


LOW-6 — Webhook Dispatcher: Deferred Closure Captures Variables Before Assignment [code verified]

File staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/dispatcher.go:131–170
Confidence 0.75

The capture bug:

// dispatcher.go:130–170
go func(invocation *generic.WebhookInvocation, idx int) {
    ignoreClientCallFailures := false   // line 131 — zero value
    hookName := "unknown"               // line 132 — zero value

    defer wg.Done()
    defer func() { recover() }()
    defer utilruntime.HandleCrash(
        func(r interface{}) {
            // ↓ captures ignoreClientCallFailures and hookName by reference
            if ignoreClientCallFailures {                        // could be false (zero value)
                klog.Warningf("Panic calling webhook, failing open %v: %v", hookName, r)  // hookName = "unknown"
                // ...fail-open path
                return
            }
            errCh <- apierrors.NewInternalError(...)           // fail-closed path
        },
    )
    // ← defers are registered, but ignoreClientCallFailures and hookName
    //   are not yet set; they're assigned at lines 169–170:
    hook, ok := invocation.Webhook.GetValidatingWebhook()      // line 164
    // ...
    hookName = hook.Name                                        // line 169
    ignoreClientCallFailures = hook.FailurePolicy != nil && ... // line 170

If the goroutine panics between lines 132 and 169 (e.g., during GetValidatingWebhook or version negotiation), HandleCrash fires with:

  • hookName = "unknown" — misleading audit logs
  • ignoreClientCallFailures = false — the panic is treated as fail-closed (error returned), even if the webhook was configured with FailurePolicy: Ignore

This results in fail-closed behavior for a webhook that should fail open, which causes spurious admission denials when the webhook crashes before reading its own config.

Recommended fix: Move hookName and ignoreClientCallFailures assignment before the defer registration:

hook, ok := invocation.Webhook.GetValidatingWebhook()
if !ok { return }
hookName := hook.Name
ignoreClientCallFailures := hook.FailurePolicy != nil && *hook.FailurePolicy == v1.Ignore
defer wg.Done()
defer func() { recover() }()
defer utilruntime.HandleCrash(func(r interface{}) {
    // now captures correct values
})

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-7 — Admission Validator: Unfiltered Namespace Passed to Audit Annotation Filter

File staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/dispatcher.go
Related staging/.../webhook/matchconditions/matcher.go:82
Confidence 0.80

When constructing CEL filter inputs for match conditions, the filtered (field-stripped) namespace is passed to the validation filter but the raw namespace is forwarded to the audit annotation filter. This asymmetry means an audit annotation CEL expression can read namespace fields (e.g., sensitive annotation values set by other controllers) that the validation CEL cannot.

This is a security boundary inconsistency, not a direct privilege escalation. It could allow audit annotations to inadvertently leak namespace metadata into audit logs.

Recommended fix: Apply the same namespace filter to both audit annotation and validation filter inputs for consistency.

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-8 — Namespace Lifecycle Admission: Inherent TOCTOU with Etcd

File staging/src/k8s.io/apiserver/pkg/admission/plugin/namespace/lifecycle/admission.go:115–165
Confidence 0.70 — well-understood limitation of the cache-based model

The admission plugin reads namespace phase from a local watch cache, not from etcd directly:

// admission.go:120–127
namespace, err := l.namespaceLister.Get(a.GetNamespace())
// ↑ returns a cached (potentially stale) namespace object

// forceLiveLookup fallback at line 149–163 uses a live GET only if the
// namespace was previously known-Terminating. A namespace deleted between
// cache sync and this request is not caught by the forceLiveLookup path.

An object admitted during this window (namespace Active in cache, Terminating in etcd) persists in the terminating namespace. The garbage controller will eventually clean it up, but the window allows brief inconsistency.

This is a known architectural limitation of Kubernetes' optimistic concurrency model — eliminating it would require a distributed lock for every admission decision. The code already implements a forceLiveLookup path for the most common case (Terminating detection).

Recommended documentation: Add a code comment explicitly marking this as an acknowledged TOCTOU with a reference to the design decision.

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-9 — NodeAuthorizer Allows Nodes to CREATE ResourceSlices Without NodeName Validation [code verified]

File plugin/pkg/auth/authorizer/node/node_authorizer.go:337–352
Confidence 0.85

The code:

// node_authorizer.go:337–352
func (r *NodeAuthorizer) authorizeResourceSlice(nodeName string, attrs authorizer.Attributes) (...) {
    // ...
    verb := attrs.GetVerb()
    switch verb {
    case "create":
        // The request must come from a node with the same name as the ResourceSlice.NodeName field.
        //
        // For create, the noderestriction admission plugin is performing this check.
        // Here we don't have access to the content of the new object.
        return authorizer.DecisionAllow, "", nil   // ← unconditional allow for all nodes
    case "get", "update", "patch", "delete":
        return r.authorize(nodeName, sliceVertexType, attrs)
    // ...
    }
}

The comment is explicit: NodeName validation for ResourceSlice CREATE is entirely delegated to the NodeRestriction admission plugin. The NodeAuthorizer cannot inspect the request body (it only sees attrs, which has metadata but not the object). This is a sound design choice — but it creates a defense-in-depth gap.

Risk scenario: If NodeRestriction is disabled (--disable-admission-plugins=NodeRestriction):

  • Any authenticated node can CREATE a ResourceSlice with any NodeName field value
  • A compromised node node-A can create slices claiming to represent device allocations for node-B
  • DRA scheduler and kubelet on node-B may act on phantom device advertisements from node-A

Mitigating factors: NodeRestriction is enabled by default and disabling it is explicitly documented as reducing the node security boundary. The CSINode and lease authorizers have the same pattern (see lines 299–300, 327–329 in node_authorizer.go).

Recommended fix: Document the dependency more prominently. Consider adding a startup warning when DRA is active (feature gate ResourceSlices) and NodeRestriction is disabled.

Prior art

No public tracking issue or PR found as of 2026-05-05.


LOW-10 — Mutating Webhook Reinvocation: reinvokeWebhooks Set Is Correct [revised]

Original finding reinvokeRequested flag not cleared between admission rounds
Assessment after code review Finding is not confirmed

Original researcher described a reinvokeRequested boolean that isn't cleared. The actual code uses a sets.Set[string] (reinvokeWebhooks) populated by RequireReinvokingPreviouslyInvokedPlugins() from previouslyInvokedReinvocableWebhooks.

The flow is correct:

  1. First pass: previouslyInvokedReinvocableWebhooks accumulates webhook UIDs
  2. On mutation: RequireReinvokingPreviouslyInvokedPlugins() copies → reinvokeWebhooks, clears source
  3. Second pass: ShouldReinvokeWebhook(uid) checks reinvokeWebhooks — only correctly flagged webhooks are reinvoked

The reinvokeWebhooks set is not cleared before the second pass, but that's intentional — it IS the second-pass inclusion list. Finding revised to not a bug; no fix needed.

Prior art

No tracking issue applicable — finding is not a bug.


Dynamic Harness — kubectl Status

Harness results directory: /Users/dsrinivas/go/src/k8s.io/kubernetes/results/kubectl-k8s/20260505T135140Z/

├── found_bugs.jsonl        ← empty (0 crashes confirmed so far)
├── recon_transcript.jsonl  ← auto-focus completed, assigned focus areas
├── run_000/find_transcript.jsonl  (109 entries, still running)
├── run_001/find_transcript.jsonl  (137 entries, still running)
└── run_002/find_transcript.jsonl  (139 entries, still running)

Observed agent behavior:

  • run_001 (quantity parsing): Tried memory=1e9223372036854775807, memory=1e-9223372036854775808, memory=9999999999999999999e9999999999999999999. All returned "unable to parse quantity's suffix", exit 1 — graceful error handling, no panic. The k8s.io/apimachinery/pkg/api/resource quantity parser appears well-hardened against exponent overflow.
  • run_002 (kubectl cp): Was reading cp.go source to identify tar extraction entry points. Key function: (*CopyOptions).untarAll — path traversal check at line ~327 uses filepath.Clean + prefix check. Whether the prefix check is bypassable with symlinks or special archive entries is still under investigation.
  • run_000 (kubeconfig): Exploring client certificate handling paths.

Why no crashes yet: kubectl is a client-side tool with aggressive input validation. Most malformed inputs return error: ... and exit 1. The race detector requires CGO, which was disabled for cross-compilation — data races in kubectl internals would not surface here. High-value targets (kubectl cp path traversal, deeply nested YAML stack overflow) require more targeted PoC construction than simple fuzzing.


Prior Art Summary

Finding Issue / PR Status
MEDIUM-1 None found Self-acknowledged in code (controller.go:315); no public tracking ticket
MEDIUM-2 None found Novel; fix on branch fix/graph-populator-extended-resource-claim
MEDIUM-3 #115523 CLOSED (kind/support) — intentional, documented behavior; confirmed by liggitt
LOW-1 None found
LOW-2 None found
LOW-3 None found
LOW-4 None found
LOW-5 #120507, #122940 Already fixed in PR #129472 (MERGED, v1.34)
LOW-6 None found
LOW-7 None found
LOW-8 None found
LOW-9 None found
LOW-10 N/A Not a bug

Risk Priority Matrix

Priority Action Finding Effort
P1 — immediate Audit all webhook configurations for resources: ["*"] without "*/*" MEDIUM-3 Low (one-liner detection)
P1 — immediate Fix graph_populator.go:114 to include PodExtendedStatusEqual MEDIUM-2 Low (one-line fix)
P2 — short-term Add integration test: ResourceClaim update with no ResourceClaimStatuses change → node can still read claim MEDIUM-2 Medium
P2 — short-term Investigate quota batching TOCTOU; add etcd CAS for UpdateQuotaStatus MEDIUM-1 High
P3 — backlog Emit webhook warning on resources: ["*"] without "*/*" MEDIUM-3 Medium
P3 — backlog Fix versionedAttrs map concurrent access with mutex LOW-5 Low
P3 — backlog Hoist hookName/ignoreClientCallFailures before defers LOW-6 Low
P3 — backlog Add sync/atomic snapshot in requestheader AuthenticateRequest LOW-1 Low
P4 — housekeeping Replace unsafe.String(h.Sum(nil)) with string(h.Sum(nil)) LOW-2 Trivial
P4 — housekeeping Return dedupedSubjects in AllowedSubjects LOW-3 Trivial
P4 — housekeeping Fix CEL fieldSelector/labelSelector double-write in compile.go LOW-4 Low
P4 — housekeeping Uniform namespace filter to both admission filters LOW-7 Low
P5 — documentation Add comment acknowledging namespace lifecycle TOCTOU and link to design decision LOW-8 Trivial
P5 — documentation Add startup warning: NodeRestriction disabled + DRA active LOW-9 (revised) Low

Scope and Limitations

Included:

  • plugin/pkg/auth/authenticator/ — request header, token, x509
  • staging/src/k8s.io/apiserver/pkg/authentication/ — token cache, OIDC, bootstrap
  • plugin/pkg/auth/authorizer/ — RBAC, node, union
  • staging/src/k8s.io/apiserver/pkg/authorization/ — CEL, webhook
  • plugin/pkg/admission/ — resourcequota, security context
  • staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/ — validating, mutating, reinvocation
  • staging/src/k8s.io/apiserver/pkg/admission/plugin/namespace/lifecycle/
  • plugin/pkg/auth/authorizer/node/ — NodeAuthorizer, graph, graph_populator
  • kubectl cp, kubectl apply/create, --kubeconfig (dynamic harness)

Excluded (out of scope):

  • vendor/ — third-party dependencies
  • **/*.pb.go — protobuf-generated code
  • **/zz_generated_*.go — generated deepcopy/defaulting/conversion
  • third_party/ — bundled external code
  • pkg/kubelet/ — node agent (cgroup access required for faithful assessment)
  • test/ and **/*_test.go — test code
  • Full kube-apiserver startup path (requires etcd + TLS bootstrap)

Dynamic harness caveats:

  • CGO disabled for cross-compilation (Linux/arm64 from macOS host) — Go race detector not available. Data races in kubectl internals are not surfaced. Static analysis partially compensates.
  • Network-dependent code paths (live API server, etcd) are out of scope.
  • Results were incomplete at report publication time. found_bugs.jsonl will be updated if crashes land after this report is generated.

Report generated: 2026-05-05
Assessment: automated multi-agent (vuln-harness-scan) + source code cross-verification
Contact: dsrinivas@nvidia.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment