Repository: kubernetes/kubernetes
Commit: 47f990437458a2b171f51b5e97a0c28c81d949d1 (master, 2026-05-05)
Methods: Static multi-agent source review (87 files across 4 researchers) + dynamic execution harness (kubectl, 3 agents)
Subsystems: authentication, authorization/RBAC, admission control/webhooks, node authorization (NodeAuthorizer + DRA graph)
- Executive Summary
- Methodology
- Findings — MEDIUM
- Findings — LOW
- LOW-1 — Request Header Authenticator: Split-Atomic TOCTOU
- LOW-2 — Token Cache:
unsafe.StringAliases Pool Hash Buffer - LOW-3 — RBAC AllowedSubjects: Deduplication Logic Silently Discarded
- LOW-4 — CEL Authorization: FieldSelector Requirements Overwritten by RawSelector
- LOW-5 — Webhook Dispatcher: Concurrent Write to versionedAttrs Map
- LOW-6 — Webhook Dispatcher: Deferred Closure Captures Variables Before Assignment
- LOW-7 — Admission Validator: Unfiltered Namespace Passed to Audit Annotation Filter
- LOW-8 — Namespace Lifecycle Admission: Inherent TOCTOU with Etcd
- LOW-9 — NodeAuthorizer Allows Nodes to CREATE ResourceSlices Without NodeName Validation
- LOW-10 — Mutating Webhook Reinvocation: reinvokeWebhooks Set Is Correct
- Dynamic Harness — kubectl Status
- Risk Priority Matrix
- Scope and Limitations
13 security-relevant findings were identified across four subsystems:
| Severity | Count | Highest-impact subsystem |
|---|---|---|
| MEDIUM | 3 | Admission (quota bypass), Node auth (DRA graph), Webhook predicates |
| LOW | 10 | Authn, Authz/CEL, Admission (dispatcher, validator, reinvocation, namespace), Node/RBAC |
No CRITICAL or HIGH findings confirmed. The three MEDIUM findings are:
- Concurrent ResourceQuota bypass (
controller.go:228) — an acknowledged comment in the code confirms the retry logic can evaluate quota against a state that "never actually exists". Authenticated attackers with CREATE rights can transiently burst through quota limits. - DRA node graph misses ExtendedResourceClaim edges (
graph_populator.go:112) —PodStatusEqualonly comparesResourceClaimStatuses, so whenExtendedResourceClaimStatuschanges independently, the fast-path skipsAddPodand theextendedClaim → pod → nodegraph edge is never added.AddPodexplicitly handles this field atgraph.go:431— the fast-path guard doesn't know about it. - Webhook wildcard
resources: ["*"]silently excludes subresources (rules.go:106) — operator-level misconfiguration that renders admission security controls ineffective againstpods/exec,pods/log, and similar subresource operations. The wildcard parsing logic confirms this:sub == "*"only matches when the rule entry itself contains a/separator.
The dynamic kubectl harness is still running. No crashes have landed. Quantity parsing is correctly
rejecting extreme exponents with error exits (not panics). kubectl cp path traversal exploration
was in progress at report time.
Four parallel researchers analyzed ~87 files across security-critical subsystems:
| Researcher | Subsystem | Files | Focus |
|---|---|---|---|
| R1 | Authentication | 26 | plugin/pkg/auth/authenticator/, staging/.../authentication/ |
| R2 | Authorization + RBAC | 16 | plugin/pkg/auth/authorizer/, pkg/registry/rbac/ |
| R3 | Admission control + webhooks | 28 | plugin/pkg/admission/, staging/.../admission/plugin/ |
| R4 | K8s cross-cutting specialist | 17 | NodeAuthorizer, graph populator, CEL authz, webhook predicates |
Each researcher read source files directly and filed structured findings. A deduplication pass
removed two overlapping observations. Findings marked with [code verified] have been confirmed
against the actual source at the commit listed above.
A kubectl binary was cross-compiled (Linux/arm64, CGO_ENABLED=0) from commit 47f9904 and
packaged into a Docker image (vuln-harness-kubectl-k8s:latest) with relevant source. Three
find-agents ran in parallel:
| Agent | Focus | Preliminary signal |
|---|---|---|
| run_000 | kubeconfig parsing, client cert handling | Still exploring |
| run_001 | Resource quantity parsing (integer overflow) | 1e9223372036854775807 → clean "unable to parse" error, exit 1 |
| run_002 | kubectl cp tar extraction (path traversal) |
Reading cp.go source for attack vector analysis |
| File | staging/src/k8s.io/apiserver/pkg/admission/plugin/resourcequota/controller.go |
| Lines | 50–341 (quotaEvaluator struct + checkQuotas); bug acknowledged at line 315 |
| Confidence | 0.65 |
| CVSS estimate | AV:N/AC:H/PR:L/UI:N/S:U/C:N/I:L/A:N ~3.7 |
| Exploit difficulty | Medium — requires controlled request timing; most effective in multi-master clusters |
The bug — in the code's own words:
// controller.go:315–317
// this retry logic has the same bug that its possible to be checking against quota
// in a state that never actually exists where you've added a new documented, then
// updated an old one, your resource matches both and you're only checking oneThis comment has existed for years. The Kubernetes team is aware of it.
Every admission request that touches a quota-tracked resource goes through quotaEvaluator.Evaluate(),
which adds an admissionWaiter to a namespace-keyed work queue and blocks:
// controller.go:650–685
func (e *quotaEvaluator) Evaluate(a admission.Attributes) error {
e.init.Do(e.start)
// ...
waiter := newAdmissionWaiter(a)
e.addWork(waiter)
select {
case <-waiter.finished: // unblocks when checkQuotas closes this channel
case <-time.After(10 * time.Second):
return apierrors.NewInternalError(...)
}
return waiter.result
}addWork places the waiter into one of two maps keyed by namespace:
// controller.go:688–702
func (e *quotaEvaluator) addWork(a *admissionWaiter) {
e.workLock.Lock()
defer e.workLock.Unlock()
ns := a.attributes.GetNamespace()
e.queue.Add(ns)
if e.inProgress.Has(ns) {
e.dirtyWork[ns] = append(e.dirtyWork[ns], a) // arrives while batch is running
return
}
e.work[ns] = append(e.work[ns], a) // normal path
}A pool of goroutines (doWork) continuously drains work[ns] batches. When a batch for
namespace X is dequeued with getWork(), the namespace is added to inProgress and all
subsequent requests for X land in dirtyWork[ns] until completeWork swaps dirty→work.
Crucially, requests that arrive during an in-progress batch form the NEXT batch; they do not
get a second bite at the current evaluation.
checkAttributes reads all quota objects for the namespace once at the start, then calls
checkQuotas with remainingRetries=3:
// controller.go:184–213
func (e *quotaEvaluator) checkAttributes(ns string, admissionAttributes []*admissionWaiter) {
// ...
quotas, err := e.quotaAccessor.GetQuotas(ns) // one read for the whole batch
// ...
e.checkQuotas(quotas, admissionAttributes, 3)
}Inside checkQuotas, requests are evaluated serially against a running quotas slice.
Each admitted request updates quotas = newQuotas so the next one sees the incremented usage:
// controller.go:236–266
for i := range admissionAttributes {
newQuotas, err := e.checkRequest(quotas, admissionAttribute.attributes)
// ...
quotas = newQuotas // carries forward: request N+1 sees N's usage
}Within one batch this is correct: if quota is 8/10 and 5 requests arrive, the first two are admitted (8→9→10), the next three are denied (10/10 full). No over-admission.
After the sequential check, each quota that changed is written to etcd via UpdateQuotaStatus:
// controller.go:288
if err := e.quotaAccessor.UpdateQuotaStatus(&newQuota); err != nil {
updatedFailedQuotas = append(updatedFailedQuotas, newQuota)
lastErr = err
}UpdateQuotaStatus issues a CoreV1().ResourceQuotas().UpdateStatus() call. etcd enforces
optimistic concurrency using the resourceVersion field: if another API server replica wrote
to the same quota object between the initial GetQuotas and this UpdateStatus, the call
returns a 409 Conflict. This is the normal mechanism that prevents lost updates.
On conflict, checkQuotas re-fetches the quota and recurses:
// controller.go:319–341
newQuotas, err := e.quotaAccessor.GetQuotas(quotas[0].Namespace)
// ...
quotasToCheck := []corev1.ResourceQuota{}
for _, newQuota := range newQuotas {
for _, oldQuota := range updatedFailedQuotas {
if newQuota.Name == oldQuota.Name {
quotasToCheck = append(quotasToCheck, newQuota)
}
}
}
e.checkQuotas(quotasToCheck, admissionAttributes, remainingRetries-1)The problem: GetQuotas reads from the informer watch cache (the lister at line 113),
not directly from etcd. The watch cache has an inherent propagation lag — it reflects etcd
state as of the last watch event, which may be hundreds of milliseconds stale. In a
multi-master cluster, three API server replicas each maintaining their own watch caches:
API server A: reads quota 8/10 (informer), admits 2, tries UpdateStatus → CONFLICT (B wrote first)
API server B: reads quota 8/10 (informer), admits 2, UpdateStatus succeeds → etcd now 10/10
API server C: reads quota 8/10 (informer), admits 2, tries UpdateStatus → CONFLICT (B wrote first)
A retries: GetQuotas → informer still shows 8/10 (watch not propagated yet)
→ checkQuotas re-runs both admitted requests → admits both again
→ UpdateStatus → 12/10 if it wins, or another conflict and another retry
C retries: same scenario
After 3 retries (remainingRetries starts at 3), remainingRetries <= 0 causes all
still-pending defaultDeny results to receive lastErr — but requests already cleared
from defaultDeny (i.e., those admitted in a prior retry round) are NOT reverted.
The admissionWaiter.result for admitted requests stays nil.
Setup: 3 API server replicas, namespace quota pods=10, currently 8 used.
Attacker (with create pods permission) fires a burst of 6 concurrent requests,
2 landing on each API server.
Each replica's quotaEvaluator:
A: batch [req1, req2] → checkRequest(8/10)→admits req1 (9/10), admits req2 (10/10)
B: batch [req3, req4] → checkRequest(8/10)→admits req3 (9/10), admits req4 (10/10)
C: batch [req5, req6] → checkRequest(8/10)→admits req5 (9/10), admits req6 (10/10)
UpdateStatus race: one replica wins (e.g. B → etcd: 10/10), A and C get 409.
A retry: GetQuotas → watch cache still 8/10 → re-admits req1, req2 again.
C retry: GetQuotas → watch cache still 8/10 → re-admits req5, req6 again.
Result: up to 14 pods in a namespace with quota=10.
(In practice, some retries will also conflict, limiting over-admission.)
- Over-admitted pods are not cleaned up — garbage collection does not remove pods that exceeded quota after the fact. Only explicit deletion removes them.
- The over-admission is bounded by
remainingRetries=3and the number of concurrent API server replicas. Realistic over-admission:quota_limit × number_of_api_servers. - Only affects resources tracked by
ResourceQuota(pods, services, pvcs, configmaps, etc.) - The attacker needs only
createpermission on a quota-tracked resource — a low bar in shared-cluster environments.
Recommended fix: Change UpdateQuotaStatus to use a conditional update that verifies the
resourceVersion returned from the initial GetQuotas read. If the version has advanced
(another replica wrote), treat it as a definitive conflict rather than a retriable error, and
deny the entire batch rather than re-evaluating with a stale baseline.
No public tracking issue or PR found as of 2026-05-05. The acknowledged comment at controller.go:315 has existed for years without a filed issue.
| File | plugin/pkg/auth/authorizer/node/graph_populator.go:102–125 |
| Graph edge added by | plugin/pkg/auth/authorizer/node/graph.go:431–434 |
| Feature gate | DRAExtendedResource — Alpha in 1.34, Beta (default: true) in 1.36 |
| Confidence | 0.90 — three-file cross-reference confirms the gap |
| CVSS estimate | AV:N/AC:L/PR:H/UI:N/S:U/C:N/I:N/A:H ~4.9 |
| Exploit difficulty | N/A — this is a false-negative (too-restrictive authorization), not privilege escalation |
Before DRA, GPUs were exposed to Kubernetes via the device plugin API: a per-node kubelet
mechanism where the node agent tracked inventory and allocations locally. The DRAExtendedResource
feature gate bridges the two worlds. When enabled, a pod requesting an extended resource like
resources.limits: nvidia.com/gpu: "1" can have that request fulfilled via DRA rather than
the legacy device plugin path — without changing the pod spec.
When the scheduler places such a pod, it synthesizes a ResourceClaim object whose name
it writes into pod.Status.ExtendedResourceClaimStatus.ResourceClaimName. This claim is NOT
listed in pod.Spec.ResourceClaims; it is an out-of-band scheduler artifact. Kubelet must
read this claim to call NodePrepareResources on the DRA driver, which is the call that
actually makes the device available to the container.
The PodStatus struct has two distinct DRA-related fields:
// staging/src/k8s.io/api/core/v1/types.go:5452–5456
ResourceClaimStatuses []PodResourceClaimStatus // standard DRA: from spec.ResourceClaims
ExtendedResourceClaimStatus *PodExtendedResourceClaimStatus // extended resource DRA: scheduler-synthesized
// +featureGate=DRAExtendedResourcekubelet wants to start GPU pod
→ reads pod.Status.ExtendedResourceClaimStatus.ResourceClaimName ("pod-abc-gpu-claim")
→ calls NodePrepareResources on DRA driver with that claim
→ must first GET resourceclaims/pod-abc-gpu-claim from API server
→ NodeAuthorizer checks hasPathFrom("node-A", resourceClaimVertexType, ns, "pod-abc-gpu-claim")
→ checks the in-memory graph for edge: extendedClaim → pod → node
→ edge was never added because AddPod() was skipped by the fast-path
→ hasPathFrom returns false → 403 Forbidden
→ NodePrepareResources never called → pod stuck in ContainerCreating
Confirmed in pkg/kubelet/cm/dra/manager.go:255–263:
if utilfeature.DefaultFeatureGate.Enabled(kubefeatures.DRAExtendedResource) {
if pod.Status.ExtendedResourceClaimStatus != nil {
extendedResourceClaim := v1.PodResourceClaim{
ResourceClaimName: &pod.Status.ExtendedResourceClaimStatus.ResourceClaimName,
}
podResourceClaims = append(podResourceClaims, extendedResourceClaim)
}
}
// podResourceClaims is then iterated to call NodePrepareResources for each claimAddPod in graph.go explicitly handles ExtendedResourceClaimStatus at line 431:
// graph.go:431–434
if pod.Status.ExtendedResourceClaimStatus != nil &&
len(pod.Status.ExtendedResourceClaimStatus.ResourceClaimName) > 0 {
claimVertex := g.getOrCreateVertexLocked(resourceClaimVertexType,
pod.Namespace, pod.Status.ExtendedResourceClaimStatus.ResourceClaimName)
g.addEdgeLocked(claimVertex, podVertex, nodeVertex) // edge: extendedClaim → pod → node
}But graph_populator.updatePod has a fast-path that skips calling AddPod when certain
pod fields are unchanged. The guard uses resourceclaim.PodStatusEqual, which compares only
ResourceClaimStatuses (the standard DRA field):
// graph_populator.go:109–118
if oldPod, ok := oldObj.(*corev1.Pod); ok && oldPod != nil {
hasNewEphemeralContainers := len(pod.Spec.EphemeralContainers) > len(oldPod.Spec.EphemeralContainers)
if (pod.Spec.NodeName == oldPod.Spec.NodeName) && (pod.UID == oldPod.UID) &&
!hasNewEphemeralContainers &&
resourceclaim.PodStatusEqual( // ← compares ResourceClaimStatuses only
oldPod.Status.ResourceClaimStatuses,
pod.Status.ResourceClaimStatuses) {
return // ← AddPod() never called
}
}PodStatusEqual in resourceclaim/pod.go:34–50 compares the Name and ResourceClaimName
fields of []PodResourceClaimStatus. It has no knowledge of the separate
*PodExtendedResourceClaimStatus pointer.
A pod with limits: nvidia.com/gpu: "1" and no spec.ResourceClaims (the dominant real-world
pattern for GPU workloads today):
T1 Scheduler binds pod to node-A (Spec.NodeName = "node-A")
informer fires: updatePod(nil, pod)
→ oldPod is nil → fast-path not evaluated → AddPod() called ✓
→ but ExtendedResourceClaimStatus is still nil at T1 → no edge added (nothing to add yet)
T2 Scheduler creates synthetic ResourceClaim "pod-abc-gpu-claim" and writes status:
pod.Status.ExtendedResourceClaimStatus = {ResourceClaimName: "pod-abc-gpu-claim"}
informer fires: updatePod(oldPod_T1, newPod_T2)
→ old.Spec.NodeName == new.Spec.NodeName ✓
→ old.UID == new.UID ✓
→ !hasNewEphemeralContainers ✓
→ PodStatusEqual(old.ResourceClaimStatuses, new.ResourceClaimStatuses)
= PodStatusEqual(nil, nil) → true ← both nil; pod has no spec.ResourceClaims
→ FAST-PATH FIRES → AddPod() is NOT called ✗
→ edge "pod-abc-gpu-claim" → pod → node-A NEVER added to graph
T3 kubelet calls GET resourceclaims/pod-abc-gpu-claim
NodeAuthorizer: hasPathFrom("node-A", resourceClaimVertexType, ns, "pod-abc-gpu-claim")
→ startingVertex found in graph, but no edge to node-A vertex
→ returns false → DecisionNoOpinion → 403 Forbidden
→ kubelet DRA manager cannot call NodePrepareResources → pod stuck forever
The scheduler's own pod event handler — pkg/scheduler/framework/events.go:161–167 — already
correctly uses both equality functions to detect changes:
func extractPodGeneratedResourceClaimChange(newPod *v1.Pod, oldPod *v1.Pod) fwk.ActionType {
if !resourceclaim.PodStatusEqual(newPod.Status.ResourceClaimStatuses, oldPod.Status.ResourceClaimStatuses) ||
!resourceclaim.PodExtendedStatusEqual(newPod.Status.ExtendedResourceClaimStatus, oldPod.Status.ExtendedResourceClaimStatus) {
return fwk.UpdatePodGeneratedResourceClaim
}
return fwk.None
}The graph populator was not updated to match when ExtendedResourceClaimStatus was added.
It is a consistency gap between two independent pod-status watchers in the same binary.
Any cluster running Kubernetes 1.36+ (where DRAExtendedResource=true by default) with:
- GPU or other extended resource workloads being migrated to the DRA path (NVIDIA GPU Operator DRA driver, Intel GPU DRA driver, etc.)
- Pods that use
resources.limits: vendor.com/device: "1"without explicitspec.ResourceClaims
Clusters on 1.34 or 1.35 with DRAExtendedResource=true set explicitly are also affected.
One line, in graph_populator.go:
// Change the fast-path guard from:
resourceclaim.PodStatusEqual(
oldPod.Status.ResourceClaimStatuses,
pod.Status.ResourceClaimStatuses)
// To:
resourceclaim.PodStatusEqual(
oldPod.Status.ResourceClaimStatuses,
pod.Status.ResourceClaimStatuses) &&
resourceclaim.PodExtendedStatusEqual(
oldPod.Status.ExtendedResourceClaimStatus,
pod.Status.ExtendedResourceClaimStatus)PodExtendedStatusEqual is already defined in staging/src/k8s.io/dynamic-resource-allocation/resourceclaim/pod.go:52.
The scheduler already calls it for exactly this purpose.
No public tracking issue or PR found as of 2026-05-05. Fix is available on branch fix/graph-populator-extended-resource-claim.
| File | staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/predicates/rules/rules.go:106–116 |
| Test coverage | rules_test.go:255–268 explicitly tests and expects this behavior (see below) |
| Confidence | 0.90 — intended design, dangerous in practice, no API-server-level warning |
| CVSS estimate | AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H ~8.5 if the webhook is a security control |
| Exploit difficulty | Zero once the misconfiguration exists; requires operator error to set up |
Matcher.Matches() calls five sub-matchers: scope, operation, group, version, resource.
The resource() function is where subresources diverge from the intuitive wildcard behavior:
// rules.go:98–116
func splitResource(resSub string) (res, sub string) {
parts := strings.SplitN(resSub, "/", 2)
if len(parts) == 2 {
return parts[0], parts[1] // "pods/exec" → ("pods", "exec")
}
return parts[0], "" // "*" → ("*", "")
}
func (r *Matcher) resource() bool {
opRes, opSub := r.Attr.GetResource().Resource, r.Attr.GetSubresource()
for _, res := range r.Rule.Resources {
res, sub := splitResource(res)
resMatch := res == "*" || res == opRes // wildcard "*" matches any resource name
subMatch := sub == "*" || sub == opSub // BUT: sub="" only matches opSub=""
if resMatch && subMatch {
return true
}
}
return false
}For a rule entry of "*": sub = "", so subMatch = ("" == "*") || ("" == opSub).
For a pods/exec request, opSub = "exec", so subMatch = false || false = false. No match.
The unit test at rules_test.go:255–268 documents this exact behavior as the expected contract:
"no subresources": {
rule: adreg.RuleWithOperations{
Rule: adreg.Rule{Resources: []string{"*"}},
},
match: attrList(
a("g", "v", "r", "", "name", admission.Create, ...), // no subresource → MATCHES
a("2", "v", "r2", "", "name", admission.Create, ...), // no subresource → MATCHES
),
noMatch: attrList(
a("g", "v", "r", "exec", "name", admission.Create, ...), // subresource → NO MATCH
a("2", "v", "r2", "proxy", "name", admission.Create, ...), // subresource → NO MATCH
),
},The behavior is not a bug in the matching code — it is specified, tested, and documented
in the API reference. The problem is that operators routinely misread "*" as "everything"
when it actually means "every resource with no subresource".
Understanding what each pattern actually matches:
Rule resources entry |
res |
sub |
Matches (resource, subresource) | Does NOT match |
|---|---|---|---|---|
"*" |
"*" |
"" |
(pods, ""), (services, "") |
(pods, exec), (pods, log) |
"*/*" |
"*" |
"*" |
(pods, exec), (pods, log) |
(pods, "") — no subresource |
"pods" |
"pods" |
"" |
(pods, "") |
(pods, exec) |
"pods/*" |
"pods" |
"*" |
(pods, exec), (pods, log) |
(pods, ""), (services, exec) |
"pods/exec" |
"pods" |
"exec" |
(pods, exec) |
(pods, log), (pods, "") |
"*/exec" |
"*" |
"exec" |
(pods, exec), (services, exec) |
(pods, log), (pods, "") |
To cover all operations on all resources including all subresources, you need both:
resources: ["*", "*/*"]Because neither "*" nor "*/*" alone covers both the resource-level and subresource-level
operations simultaneously.
Any operation where GetSubresource() returns a non-empty string is excluded:
| Operation | Subresource string | Who can call it |
|---|---|---|
kubectl exec |
exec |
Any user with pods/exec RBAC |
kubectl logs |
log |
Any user with pods/log RBAC |
kubectl attach |
attach |
Any user with pods/attach RBAC |
kubectl port-forward |
portforward |
Any user with pods/portforward RBAC |
kubectl cp |
exec (streams via exec) |
Any user with pods/exec RBAC |
| Deployment scale | scale |
Any user with deployments/scale RBAC |
| Pod ephemeral containers | ephemeralcontainers |
Any user with pods/ephemeralcontainers |
| Node proxy | proxy |
Any user with nodes/proxy RBAC |
| Pod status update | status |
Controllers, operators with pods/status |
| Token requests | token |
Any user with serviceaccounts/token |
OPA/Gatekeeper, Kyverno, and Kubewarden all have community policy libraries with entries like:
# common in community policies — DOES NOT intercept exec/log/attach
rules:
- apiGroups: ["*"]
apiVersions: ["*"]
resources: ["*"]
operations: ["CREATE", "UPDATE", "DELETE"]An attacker with pods/exec permission (but blocked from creating new privileged pods by
a Kyverno policy using this pattern) can exec into an existing pod and escape without
the policy webhook ever firing. The policy author believed they covered all pod operations;
exec bypasses it entirely.
Concrete scenario:
- Cluster has a Kyverno policy blocking privileged pod creation (
resources: ["pods"]). - Attacker cannot create a new privileged pod — webhook denies it.
- Attacker has
pods/execon an existing non-privileged pod in the same namespace. - They exec into the pod and access secrets mounted there (service account token, env vars).
- With those credentials they escalate further.
The webhook was never invoked for step 3 or 4.
# Find webhooks with resources: ["*"] that are missing "*/*"
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations -o json | \
jq -r '
.items[] |
.metadata.name as $wh_name |
.webhooks[]? |
. as $hook |
.rules[]? |
select(.resources | index("*")) |
select(.resources | index("*/*") | not) |
"\($wh_name)/\($hook.name): resources=[\"*\"] without [\"*/*\"] — subresource ops bypass this rule"
'Operator (immediate): Replace every resources: ["*"] with resources: ["*", "*/*"]
in webhook configurations where subresource interception is intended.
Per-subresource (explicit): For webhook rules that only need to cover specific high-risk subresources rather than all of them:
resources: ["pods", "pods/exec", "pods/attach", "pods/ephemeralcontainers"]API server (long-term): Emit a Warning header during ValidatingWebhookConfiguration
or MutatingWebhookConfiguration creation/update when any rule has resources containing
"*" without "*/*". This is a pure UX addition with no behavior change.
kubernetes/kubernetes#115523 (CLOSED, kind/support, triage/accepted) — filed 2023-04, describes exactly this behavior. Jordan Liggitt confirmed it is intentional and matches the documented API: "*" matches all resources but not subresources; "*/*" is required for both. Closed as a documentation/support request, no behavior change planned.
| File | staging/src/k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader.go:121–141 |
| Config storage | requestheader_controller.go:82 (exportedRequestHeaderBundle atomic.Value) |
| Confidence | 0.35 — very narrow window, no realistic exploit path |
The controller uses atomic.Value to store the config bundle (loaded from the
extension-apiserver-authentication ConfigMap). Each StringSliceProvider.Value() call is
an atomic load — individually safe.
The subtle race is between two separate Value() calls in AuthenticateRequest:
// requestheader.go:121–141
func (a *requestHeaderAuthRequestHandler) AuthenticateRequest(req *http.Request) (...) {
name := headerValue(req.Header, a.nameHeaders.Value()) // atomic load #1: old config
uid := headerValue(req.Header, a.uidHeaders.Value())
groups := allHeaderValues(req.Header, a.groupHeaders.Value())
extra := newExtra(req.Header, a.extraHeaderPrefixes.Value())
// ← ConfigMap update fires here, atomic.Store replaces the bundle
ClearAuthenticationHeaders(req.Header,
a.nameHeaders, a.uidHeaders, a.groupHeaders, a.extraHeaderPrefixes)
// ClearAuthenticationHeaders calls Value() again (atomic load #2: new config)
// → clears headers named in the NEW config, not the ones read by load #1
// → old header names remain uncleaned in req.Header
}If the ConfigMap changes header names (e.g., X-Remote-User → X-Custom-User) between
loads #1 and #2:
- Authentication succeeds using the
X-Remote-Userheader (old config) ClearAuthenticationHeadersdeletesX-Custom-User(new config)X-Remote-Useris left inreq.Header— it passes downstream uncleaned
The downstream aggregated API server, if it also trust-proxies the X-Remote-User header,
would see the user identity a second time. In practice, configmap updates are rare and the
window is extremely short.
Recommended fix: Snapshot the provider value once and pass the snapshot to both
AuthenticateRequest body and ClearAuthenticationHeaders:
nameHeaders := a.nameHeaders.Value()
// use nameHeaders (not a.nameHeaders) throughoutNo public tracking issue or PR found as of 2026-05-05.
| File | staging/src/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go:230–292 |
| Confidence | 0.30 — safe with all current Go stdlib hash implementations |
The code:
// cached_token_authenticator.go:232–252
func keyFunc(hashPool *sync.Pool, auds []string, token string) string {
h := hashPool.Get().(hash.Hash)
h.Reset()
// ...writes to h...
key := toString(h.Sum(nil)) // ← unsafe alias
hashPool.Put(h) // ← pool returns h; Reset() may reuse internal buffer
return key
}
// toString creates a string header pointing to the same memory as b
// without copying it:
func toString(b []byte) string {
if len(b) == 0 { return "" }
return unsafe.String(unsafe.SliceData(b), len(b))
}h.Sum(nil) on Go's standard SHA256 implementation allocates a new slice for the result.
The string returned by toString therefore aliases that newly-allocated slice — safe.
The fragile invariant: If a future hash implementation (or a hash registered via a
crypto.RegisterHash plugin) returns a borrowed slice from Sum(nil) (an implementation
detail not prohibited by the hash.Hash interface), then after hashPool.Put(h) and a
subsequent h.Reset() by the pool, the memory backing key could be zeroed or overwritten.
The cache key would then silently become "" or a corrupted string, causing a cache miss or
a wrong cache hit.
Recommended fix:
key := string(h.Sum(nil)) // standard allocation; the unsafe optimization is not justified hereThe cache key is computed on every request miss and stored once in the LRU cache. The extra allocation is negligible.
No public tracking issue or PR found as of 2026-05-05.
| File | plugin/pkg/auth/authorizer/rbac/subject_locator.go:109–124 |
| Confidence | 0.25 — no current production caller relies on dedup |
The code:
// subject_locator.go:109–124
dedupedSubjects := []rbacv1.Subject{}
for _, subject := range subjects {
found := false
for _, curr := range dedupedSubjects {
if curr == subject {
found = true
break
}
}
if !found {
dedupedSubjects = append(dedupedSubjects, subject)
}
}
return subjects, utilerrors.NewAggregate(errorlist)
// ↑ returns the original undeduped slice; dedupedSubjects is thrown awaydedupedSubjects is built correctly but subjects (the original undeduped list) is returned.
The O(n²) dedup loop is dead code. This is likely a bug introduced when the return was changed
without updating to reference dedupedSubjects.
Current impact: No production caller depends on deduplication; they dedup themselves. The dead code introduces maintenance confusion and a false sense of correctness.
Recommended fix: Return dedupedSubjects instead of subjects.
No public tracking issue or PR found as of 2026-05-05.
| File | staging/src/k8s.io/apiserver/pkg/authorization/cel/compile.go:289–303 |
| Confidence | 0.40 — both fields being set simultaneously is unusual |
The code (repeated pattern for both FieldSelector and LabelSelector):
// compile.go:289–303
if len(obj.ResourceAttributes.FieldSelector.Requirements) > 0 {
// builds requirements map and assigns:
resourceAttributes[fieldSelectorVarName] = map[string]interface{}{"requirements": requirements}
}
if len(obj.ResourceAttributes.FieldSelector.RawSelector) > 0 {
// overwrites the key just set above:
resourceAttributes[fieldSelectorVarName] = map[string]interface{}{"rawSelector": obj.ResourceAttributes.FieldSelector.RawSelector}
}If a SubjectAccessReview carries both Requirements and RawSelector (allowed by the API),
the second if block overwrites the fieldSelector map entry written by the first block.
CEL authorization expressions that inspect fieldSelector.requirements will evaluate against
an empty/nil value. The identical pattern appears for LabelSelector at lines 306–319.
Practical risk: Authorization webhook auditors or RBAC CEL policies that check field
selector requirements would silently receive empty requirements when RawSelector is also
set. Policy decisions would be based on incomplete selector information.
Recommended fix:
fs := map[string]interface{}{}
if len(obj.ResourceAttributes.FieldSelector.Requirements) > 0 {
fs["requirements"] = requirements
}
if len(obj.ResourceAttributes.FieldSelector.RawSelector) > 0 {
fs["rawSelector"] = obj.ResourceAttributes.FieldSelector.RawSelector
}
if len(fs) > 0 {
resourceAttributes[fieldSelectorVarName] = fs
}No public tracking issue or PR found as of 2026-05-05.
| File | staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/dispatcher.go:69–94, 126–133 |
| Confidence | 0.50 — race exists but collision requires two webhooks for same GroupVersionKind |
The map and the goroutines:
// dispatcher.go:69–84 (versionedAttributeAccessor)
type versionedAttributeAccessor struct {
versionedAttrs map[schema.GroupVersionKind]*admission.VersionedAttributes
// no mutex
}
func (v *versionedAttributeAccessor) VersionedAttribute(gvk schema.GroupVersionKind) (...) {
if val, ok := v.versionedAttrs[gvk]; ok { return val, nil } // concurrent read
// ...
v.versionedAttrs[gvk] = versionedAttr // concurrent write
return versionedAttr, nil
}// dispatcher.go:126–133
wg := sync.WaitGroup{}
wg.Add(len(relevantHooks))
for i := range relevantHooks {
go func(invocation *generic.WebhookInvocation, idx int) {
// ...
versionedAttr := versionedAttrAccessor.versionedAttrs[invocation.Kind] // unprotected readWhen two webhook goroutines both need the same invocation.Kind, one reads while the other
may be writing. The Go race detector (-race, which our kubectl binary was built with) would
flag this. In production, the race could manifest as a panic from a concurrent map read/write.
Mitigating factors: In practice, most webhook configurations use distinct resource versions
per webhook. The race requires two webhooks with matching Kind in the same dispatch batch.
Recommended fix: Add a sync.Mutex to versionedAttributeAccessor, or pre-populate the
map in the serial phase before launching goroutines.
kubernetes/kubernetes#120507 (CLOSED) and kubernetes/kubernetes#122940 (CLOSED) both describe apiserver panics (fatal error: concurrent map iteration and map write) from this exact race during webhook failures. Fixed by PR #129472 (MERGED, milestone v1.34). This finding is already fixed in master.
| File | staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/dispatcher.go:131–170 |
| Confidence | 0.75 |
The capture bug:
// dispatcher.go:130–170
go func(invocation *generic.WebhookInvocation, idx int) {
ignoreClientCallFailures := false // line 131 — zero value
hookName := "unknown" // line 132 — zero value
defer wg.Done()
defer func() { recover() }()
defer utilruntime.HandleCrash(
func(r interface{}) {
// ↓ captures ignoreClientCallFailures and hookName by reference
if ignoreClientCallFailures { // could be false (zero value)
klog.Warningf("Panic calling webhook, failing open %v: %v", hookName, r) // hookName = "unknown"
// ...fail-open path
return
}
errCh <- apierrors.NewInternalError(...) // fail-closed path
},
)
// ← defers are registered, but ignoreClientCallFailures and hookName
// are not yet set; they're assigned at lines 169–170:
hook, ok := invocation.Webhook.GetValidatingWebhook() // line 164
// ...
hookName = hook.Name // line 169
ignoreClientCallFailures = hook.FailurePolicy != nil && ... // line 170If the goroutine panics between lines 132 and 169 (e.g., during GetValidatingWebhook or
version negotiation), HandleCrash fires with:
hookName = "unknown"— misleading audit logsignoreClientCallFailures = false— the panic is treated as fail-closed (error returned), even if the webhook was configured withFailurePolicy: Ignore
This results in fail-closed behavior for a webhook that should fail open, which causes spurious admission denials when the webhook crashes before reading its own config.
Recommended fix: Move hookName and ignoreClientCallFailures assignment before the
defer registration:
hook, ok := invocation.Webhook.GetValidatingWebhook()
if !ok { return }
hookName := hook.Name
ignoreClientCallFailures := hook.FailurePolicy != nil && *hook.FailurePolicy == v1.Ignore
defer wg.Done()
defer func() { recover() }()
defer utilruntime.HandleCrash(func(r interface{}) {
// now captures correct values
})No public tracking issue or PR found as of 2026-05-05.
| File | staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/validating/dispatcher.go |
| Related | staging/.../webhook/matchconditions/matcher.go:82 |
| Confidence | 0.80 |
When constructing CEL filter inputs for match conditions, the filtered (field-stripped) namespace is passed to the validation filter but the raw namespace is forwarded to the audit annotation filter. This asymmetry means an audit annotation CEL expression can read namespace fields (e.g., sensitive annotation values set by other controllers) that the validation CEL cannot.
This is a security boundary inconsistency, not a direct privilege escalation. It could allow audit annotations to inadvertently leak namespace metadata into audit logs.
Recommended fix: Apply the same namespace filter to both audit annotation and validation filter inputs for consistency.
No public tracking issue or PR found as of 2026-05-05.
| File | staging/src/k8s.io/apiserver/pkg/admission/plugin/namespace/lifecycle/admission.go:115–165 |
| Confidence | 0.70 — well-understood limitation of the cache-based model |
The admission plugin reads namespace phase from a local watch cache, not from etcd directly:
// admission.go:120–127
namespace, err := l.namespaceLister.Get(a.GetNamespace())
// ↑ returns a cached (potentially stale) namespace object
// forceLiveLookup fallback at line 149–163 uses a live GET only if the
// namespace was previously known-Terminating. A namespace deleted between
// cache sync and this request is not caught by the forceLiveLookup path.An object admitted during this window (namespace Active in cache, Terminating in etcd) persists in the terminating namespace. The garbage controller will eventually clean it up, but the window allows brief inconsistency.
This is a known architectural limitation of Kubernetes' optimistic concurrency model —
eliminating it would require a distributed lock for every admission decision. The code
already implements a forceLiveLookup path for the most common case (Terminating detection).
Recommended documentation: Add a code comment explicitly marking this as an acknowledged TOCTOU with a reference to the design decision.
No public tracking issue or PR found as of 2026-05-05.
LOW-9 — NodeAuthorizer Allows Nodes to CREATE ResourceSlices Without NodeName Validation [code verified]
| File | plugin/pkg/auth/authorizer/node/node_authorizer.go:337–352 |
| Confidence | 0.85 |
The code:
// node_authorizer.go:337–352
func (r *NodeAuthorizer) authorizeResourceSlice(nodeName string, attrs authorizer.Attributes) (...) {
// ...
verb := attrs.GetVerb()
switch verb {
case "create":
// The request must come from a node with the same name as the ResourceSlice.NodeName field.
//
// For create, the noderestriction admission plugin is performing this check.
// Here we don't have access to the content of the new object.
return authorizer.DecisionAllow, "", nil // ← unconditional allow for all nodes
case "get", "update", "patch", "delete":
return r.authorize(nodeName, sliceVertexType, attrs)
// ...
}
}The comment is explicit: NodeName validation for ResourceSlice CREATE is entirely delegated to
the NodeRestriction admission plugin. The NodeAuthorizer cannot inspect the request body
(it only sees attrs, which has metadata but not the object). This is a sound design choice —
but it creates a defense-in-depth gap.
Risk scenario: If NodeRestriction is disabled (--disable-admission-plugins=NodeRestriction):
- Any authenticated node can CREATE a
ResourceSlicewith anyNodeNamefield value - A compromised node
node-Acan create slices claiming to represent device allocations fornode-B - DRA scheduler and kubelet on
node-Bmay act on phantom device advertisements fromnode-A
Mitigating factors: NodeRestriction is enabled by default and disabling it is explicitly
documented as reducing the node security boundary. The CSINode and lease authorizers have
the same pattern (see lines 299–300, 327–329 in node_authorizer.go).
Recommended fix: Document the dependency more prominently. Consider adding a startup
warning when DRA is active (feature gate ResourceSlices) and NodeRestriction is disabled.
No public tracking issue or PR found as of 2026-05-05.
| Original finding | reinvokeRequested flag not cleared between admission rounds |
| Assessment after code review | Finding is not confirmed |
Original researcher described a reinvokeRequested boolean that isn't cleared. The actual
code uses a sets.Set[string] (reinvokeWebhooks) populated by
RequireReinvokingPreviouslyInvokedPlugins() from previouslyInvokedReinvocableWebhooks.
The flow is correct:
- First pass:
previouslyInvokedReinvocableWebhooksaccumulates webhook UIDs - On mutation:
RequireReinvokingPreviouslyInvokedPlugins()copies →reinvokeWebhooks, clears source - Second pass:
ShouldReinvokeWebhook(uid)checksreinvokeWebhooks— only correctly flagged webhooks are reinvoked
The reinvokeWebhooks set is not cleared before the second pass, but that's intentional —
it IS the second-pass inclusion list. Finding revised to not a bug; no fix needed.
No tracking issue applicable — finding is not a bug.
Harness results directory: /Users/dsrinivas/go/src/k8s.io/kubernetes/results/kubectl-k8s/20260505T135140Z/
├── found_bugs.jsonl ← empty (0 crashes confirmed so far)
├── recon_transcript.jsonl ← auto-focus completed, assigned focus areas
├── run_000/find_transcript.jsonl (109 entries, still running)
├── run_001/find_transcript.jsonl (137 entries, still running)
└── run_002/find_transcript.jsonl (139 entries, still running)
Observed agent behavior:
- run_001 (quantity parsing): Tried
memory=1e9223372036854775807,memory=1e-9223372036854775808,memory=9999999999999999999e9999999999999999999. All returned"unable to parse quantity's suffix", exit 1 — graceful error handling, no panic. Thek8s.io/apimachinery/pkg/api/resourcequantity parser appears well-hardened against exponent overflow. - run_002 (kubectl cp): Was reading
cp.gosource to identify tar extraction entry points. Key function:(*CopyOptions).untarAll— path traversal check at line ~327 usesfilepath.Clean+ prefix check. Whether the prefix check is bypassable with symlinks or special archive entries is still under investigation. - run_000 (kubeconfig): Exploring client certificate handling paths.
Why no crashes yet: kubectl is a client-side tool with aggressive input validation. Most
malformed inputs return error: ... and exit 1. The race detector requires CGO, which was
disabled for cross-compilation — data races in kubectl internals would not surface here.
High-value targets (kubectl cp path traversal, deeply nested YAML stack overflow) require
more targeted PoC construction than simple fuzzing.
| Finding | Issue / PR | Status |
|---|---|---|
| MEDIUM-1 | None found | Self-acknowledged in code (controller.go:315); no public tracking ticket |
| MEDIUM-2 | None found | Novel; fix on branch fix/graph-populator-extended-resource-claim |
| MEDIUM-3 | #115523 | CLOSED (kind/support) — intentional, documented behavior; confirmed by liggitt |
| LOW-1 | None found | — |
| LOW-2 | None found | — |
| LOW-3 | None found | — |
| LOW-4 | None found | — |
| LOW-5 | #120507, #122940 | Already fixed in PR #129472 (MERGED, v1.34) |
| LOW-6 | None found | — |
| LOW-7 | None found | — |
| LOW-8 | None found | — |
| LOW-9 | None found | — |
| LOW-10 | N/A | Not a bug |
| Priority | Action | Finding | Effort |
|---|---|---|---|
| P1 — immediate | Audit all webhook configurations for resources: ["*"] without "*/*" |
MEDIUM-3 | Low (one-liner detection) |
| P1 — immediate | Fix graph_populator.go:114 to include PodExtendedStatusEqual |
MEDIUM-2 | Low (one-line fix) |
| P2 — short-term | Add integration test: ResourceClaim update with no ResourceClaimStatuses change → node can still read claim |
MEDIUM-2 | Medium |
| P2 — short-term | Investigate quota batching TOCTOU; add etcd CAS for UpdateQuotaStatus | MEDIUM-1 | High |
| P3 — backlog | Emit webhook warning on resources: ["*"] without "*/*" |
MEDIUM-3 | Medium |
| P3 — backlog | Fix versionedAttrs map concurrent access with mutex |
LOW-5 | Low |
| P3 — backlog | Hoist hookName/ignoreClientCallFailures before defers |
LOW-6 | Low |
| P3 — backlog | Add sync/atomic snapshot in requestheader AuthenticateRequest |
LOW-1 | Low |
| P4 — housekeeping | Replace unsafe.String(h.Sum(nil)) with string(h.Sum(nil)) |
LOW-2 | Trivial |
| P4 — housekeeping | Return dedupedSubjects in AllowedSubjects |
LOW-3 | Trivial |
| P4 — housekeeping | Fix CEL fieldSelector/labelSelector double-write in compile.go |
LOW-4 | Low |
| P4 — housekeeping | Uniform namespace filter to both admission filters | LOW-7 | Low |
| P5 — documentation | Add comment acknowledging namespace lifecycle TOCTOU and link to design decision | LOW-8 | Trivial |
| P5 — documentation | Add startup warning: NodeRestriction disabled + DRA active | LOW-9 (revised) | Low |
Included:
plugin/pkg/auth/authenticator/— request header, token, x509staging/src/k8s.io/apiserver/pkg/authentication/— token cache, OIDC, bootstrapplugin/pkg/auth/authorizer/— RBAC, node, unionstaging/src/k8s.io/apiserver/pkg/authorization/— CEL, webhookplugin/pkg/admission/— resourcequota, security contextstaging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/— validating, mutating, reinvocationstaging/src/k8s.io/apiserver/pkg/admission/plugin/namespace/lifecycle/plugin/pkg/auth/authorizer/node/— NodeAuthorizer, graph, graph_populatorkubectl cp,kubectl apply/create,--kubeconfig(dynamic harness)
Excluded (out of scope):
vendor/— third-party dependencies**/*.pb.go— protobuf-generated code**/zz_generated_*.go— generated deepcopy/defaulting/conversionthird_party/— bundled external codepkg/kubelet/— node agent (cgroup access required for faithful assessment)test/and**/*_test.go— test code- Full kube-apiserver startup path (requires etcd + TLS bootstrap)
Dynamic harness caveats:
- CGO disabled for cross-compilation (Linux/arm64 from macOS host) — Go race detector not available. Data races in kubectl internals are not surfaced. Static analysis partially compensates.
- Network-dependent code paths (live API server, etcd) are out of scope.
- Results were incomplete at report publication time.
found_bugs.jsonlwill be updated if crashes land after this report is generated.
Report generated: 2026-05-05
Assessment: automated multi-agent (vuln-harness-scan) + source code cross-verification
Contact: dsrinivas@nvidia.com