Product

GPU Runtime Security: Why AI Clouds Are Flying Blind

September 23, 2025

While many security tools are mature and sophisticated, they were architected for a CPU-centric world and are blind to GPU runtime operations and workloads.

‍

Widespread generative AI (GenAI) adoption, powered by billion-dollar investments in GPU-based infrastructure, is forging a new and invisible attack surface.

As enterprises migrate their most valuable intellectual property to cloud-based AI workloads, they are placing their trust in an advanced generation of cloud native application protection platforms (CNAPPs). While many of these tools are mature and sophisticated, they were architected for a CPU-centric world and are blind to the critical computations that define the modern era.

Leading security vendors have successfully built their platforms with the power of extended Berkeley Packet Filter (eBPF), a technology that surfaces real-time visibility into the host operating system.

However, these solutions work via kernel-mediated events and are blind to operations occurring on the GPU. This is not a minor feature gap or something that can be patched; it is a foundational limitation in today’s cloud and AI security model, and one that requires a different approach and positioning in the software stack.

eBPF and the Limits of Runtime Security

To understand the gravity of the gap, it is important to appreciate the technological leap that eBPF represents and why it became the undisputed industry standard for runtime security. The move to eBPF was not merely an incremental improvement but a necessary response to the inherent instability of its predecessors.

For years, security products gained deep system visibility through kernel modules. These modules operate with the highest level of privilege (ring 0), allowing them to intercept system calls, monitor network traffic and block malicious processes with ultimate authority. Because they are integrated directly into the core of the operating system, a single bug in a third-party kernel module developed by security products could destabilize the entire system, leading to the infamous “blue screen of death” on Windows or a “kernel panic” on Linux.

The CrowdStrike outage of July 2024 provided a stark, real-world demonstration of the dangers of directly interacting with kernels. A faulty update to a kernel-level component caused millions of Windows machines worldwide to crash. Even before this incident, platform vendors like Microsoft had been working for years to develop safer, standardized APIs to reduce the need for such invasive techniques.

eBPF emerged as the solution since it allows developers to run small, sandboxed programs directly within the Linux kernel, but without the risks of system instability. Its core innovation is the in-kernel verifier, a checkpoint that analyzes an eBPF program’s code before it is loaded. The verifier ensures the program is safe by confirming it will eventually terminate, cannot access arbitrary memory and will not cause a kernel crash.

This sandboxed programmability unlocked the best of both worlds: deep visibility into kernel-mediated events, network socket operations, process creation and filesystem access without compromising system stability.

Recognizing its potential, the cloud security industry rapidly rallied around eBPF as the new standard. Market leaders in cloud security, like Palo Alto Networks, Wiz and CrowdStrike built their flagship runtime protection offerings on its foundation, marketing it as the key to safe, comprehensive visibility.

This industrywide adoption, however, was predicated on an assumption that is now being invalidated by the rise of AI: that all meaningful computational activity is visible to the host kernel. While eBPF provides a powerful lens into the CPU-centric world, it is blind to the new universe of computation happening on the GPU.

Why GPUs Are a Security Black Box on the PCIe Bus

The failure of eBPF to monitor GPU workloads is not a bug or an oversight; it is a direct consequence of the architectural design of modern accelerated computing. GPUs achieve their performance by operating as autonomous systems that bypass the very kernel-mediated pathways that eBPF relies on for observation.

A modern GPU is not a simple add-on that executes instructions on behalf of the CPU. It is a highly specialized, massively parallel computer in its own right. It possesses its own distinct instruction set, such as NVIDIA’s PTX, its own complex scheduler for managing thousands of concurrent threads, its own multilevel memory hierarchy (VRAM), and its own control logic managed by on-chip firmware.

Code written in a framework like CUDA is compiled into a binary “kernel.” This binary is not executed by the host CPU. Instead, user space libraries (such as libcuda.so) and the GPU driver package this kernel and its associated data, sending it to the GPU for execution. From that point forward, the entire computational process occurs entirely within the GPU, governed by its firmware and completely independent of the host operating system’s kernel.

For these reasons, there are four key barriers that make it difficult to gain visibility into the GPU execution environment.

Barrier 1: Why GPU Execution Escapes Syscall Monitoring

eBPF derives its power from its ability to attach to hooks within the kernel, primarily those associated with system calls. When a traditional application wants to perform an action like opening a file or sending a network packet, it makes a syscall, creating an observable event.

However, when a host application launches a CUDA kernel on a GPU, the host OS kernel only sees a single, high-level ioctl() (Input/Output Control) call made to the GPU driver. This call is opaque; it essentially says, “GPU, please run this job.” It does not provide any details about what the job is. The billions of instructions that the GPU subsequently executes, the memory it accesses within its own VRAM and the complex control flow of the AI model itself generate no further syscalls, software interrupts or kernel context switches for eBPF to monitor.

Barrier 2: DMA Data Paths Bypass Kernel Security

To achieve the high bandwidth necessary for AI, GPUs communicate with the host system’s main memory using direct memory access (DMA). DMA allows the GPU to read and write data directly to and from system RAM over the PCIe bus, completely bypassing the CPU and kernel visibility. This mechanism is essential for performance, as it avoids the overhead of having the CPU act as an intermediary for massive data transfers like loading model weights or input tensors.

From a security perspective, however, DMA is a gaping hole in visibility. These memory operations do not trigger syscalls, page faults or any other event that can be traced with existing tools.

Barrier 3: Closed GPU Drivers Limit Security Visibility

The core point of visibility in existing solutions depends on the kernel being instrumented with a rich set of hooks that monitoring tools can attach to. Proprietary GPU drivers, including NVIDIA’s nvidia.ko module and similar offerings from other vendors, are typically distributed as closed-source binary modules. These drivers operate as black boxes without the public-facing instrumentation hooks that security tools require for granular insight into GPU operations.

While some introspection capabilities exist through vendor-provided tools like nvidia-smi, these are generally poll-based systems that expose high-level performance counters rather than providing the real-time, event-driven security telemetry that would enable detection of specific malicious behaviors similar to what eBPF provides for other kernel subsystems.

Barrier 4: GPU-to-GPU Traffic Is Invisible to Security Tools

Large-scale AI training involves dozens, hundreds or thousands of GPUs working in parallel. To support this requirement, technologies like NVIDIA’s Collective Communications Library (NCCL) and GPUDirect enable GPUs to communicate directly with each other, either over dedicated high-speed interconnects like NVLink or peer-to-peer across the PCIe bus. This traffic never passes through the host CPU or its main memory.

Because of this, the host kernel has no visibility into these GPU-to-GPU data transfers, making it impossible for CNAPPs and runtime security tools to detect threats like data leakage or malicious interference between GPUs in a multitenant training environment.

The cumulative effect of these barriers is noteworthy since the architectural choices made to optimize GPUs for performance are precisely what render them invisible to a security paradigm built on kernel-level observation.

FAQs

Q1: Why can’t eBPF secure GPU workloads?
A: eBPF relies on kernel hooks, but GPU execution bypasses the kernel, leaving no traceable syscalls or events.

Q2: What risks do blind GPU workloads create for enterprises?
A: They expose AI models and IP to data leakage, malicious interference, and compliance failures without detection.

Q3: How should enterprises secure GPU workloads?
A: By adopting hardened runtime isolation and infrastructure designed for AI, not retrofitted CPU-centric tools.

‍A version of this blog was previously published on The New Stack on August 4, 2025.

Authors

Dan Fernandez

Share with the world

Copy link

https://edera.dev/stories/

gpu-runtime-security-why-ai-clouds-are-flying-blind

About Edera

Introducing secure-by-design AI and Kubernetes no matter where you run your infrastructure. Security simplified.