Edera at the Edge: Running Untrusted Workloads on Kernels You Cannot Fix
The Edge Is Not a Smaller Cloud
The industry still talks about edge computing as if it were a smaller, slightly constrained version of the cloud. That framing is convenient, but it is wrong in the ways that matter. The defining characteristic of the edge is not limited CPU or intermittent connectivity. It is the combination of weak operational control and long-lived systems running unpatched kernels that do not evolve at the pace of the software running on them.
Once you look at real edge deployments, the abstraction breaks down quickly. You are not dealing with homogenous fleets that can be drained, rebuilt, and rolled forward. You are dealing with machines that are physically distributed, often remotely managed, and frequently tied to hardware, firmware, and vendor constraints that make upgrades difficult or undesirable. The operating system on these systems is not something you continuously refresh. It is something you inherit, stabilize, and then avoid touching unless absolutely necessary.
Edge Operating Systems: Old Kernels You Cannot Fix
That leads to a situation that is rarely stated explicitly. A significant portion of edge infrastructure is running kernels that are old, heavily patched out of tree, or coupled to vendor-specific drivers and firmware in ways that prevent independent evolution — including independent evolution with the applications they run. Security fixes, when they exist, are backported selectively and sometimes incompletely. Observability into kernel behavior is limited, and the ability to reason about the correctness of that kernel is even more limited. The system is considered stable precisely because it does not change, not because it is well understood or secure.
This is not an operational failure. It is a consequence of how these systems are designed and deployed. Industrial systems, retail infrastructure, and telco platforms prioritize continuity of operation over continuous upgrade. The result is a kernel that accumulates history without shedding complexity, and that complexity becomes part of the runtime environment for every workload on the system.
Workloads Have Changed, the Substrate Has Not
At the same time, the workload model on these systems has shifted. What used to be single-purpose appliances are now expected to run multiple classes of software. Vendor agents, telemetry pipelines, update mechanisms, application logic, and increasingly machine learning inference all coexist on the same node. In many cases, these components are built by different teams, released on different cadences, and operate under different trust assumptions.
The system may still be described as single-tenant, but in practice it behaves like a multi-tenant environment with shared infrastructure and weak boundaries. The number of independent actors interacting with the system has increased, but the workload isolation model has not evolved to match that increase.
Why Containers Fail When the Kernel Is Out of Your Control
Containers were not designed for this environment. The container model assumes that the kernel is part of the trusted computing base (TCB) and that it can be treated as a correct and enforceable boundary. Namespaces, cgroups, seccomp, and LSMs all operate within that kernel. They provide isolation only to the extent that the kernel enforces those abstractions correctly and consistently.
In a well-managed cloud environment, that assumption is already under pressure. At the edge, it is often invalid. The kernel is not continuously updated, not uniformly configured, and not easily observable. Treating it as a reliable enforcement point is a mismatch between the model and the environment.
Kernel Sharing Expands the Blast Radius
When multiple workloads share a kernel, the isolation boundary is effectively the kernel itself. A local privilege escalation inside one workload is not contained within that workload. It becomes a path to node-level compromise. Even in the absence of a known vulnerability, the kernel surface area is large enough that subtle interactions between subsystems can produce behavior that is difficult to predict under adversarial conditions.
Features such as asynchronous I/O, memory management interfaces, filesystem implementations, and device drivers introduce complexity that accumulates over time. On a system that is rarely updated and not deeply observable, that complexity becomes risk. New user space interacting with old kernel subsystems creates mismatches in expectations about behavior, performance, and safety. Vendor patches may alter those subsystems in ways that are undocumented or poorly understood.
The result is a system where the effective security boundary is both large and opaque.
Why Seccomp and LSMs Don't Reduce the Kernel Attack Surface
Adding more policy inside the kernel does not fundamentally change this. Tightening seccomp profiles or refining LSM rules can reduce exposure to known interfaces, but it does not reduce the size of the trusted computing base. It assumes that the kernel continues to behave correctly and that policy enforcement itself is not bypassed.
When the kernel is the thing you do not trust or cannot fix, placing more responsibility on it does not improve the situation. It increases reliance on a component that is already the weakest part of the system.
Moving the Boundary: One Workload, One Kernel
The alternative is to change the boundary. Edera’s model places each workload in its own kernel, running in an isolated execution context backed by hardware virtualization. Today, this model is implemented on Xen, and a KVM-based implementation is being developed with the same isolation principles. The underlying hypervisor changes, but the boundary does not.
The host kernel remains present and shared across the system. What changes is not its existence, but its position in the execution path. Workloads no longer issue syscalls into the host kernel. Instead, they interact with their own kernel, which interacts with a virtual machine monitor, which in turn interacts with the host.
This removes the host kernel from the direct syscall interface exposed to workloads. Its attack surface is therefore no longer the full Linux syscall and subsystem surface, but a mediated interface defined by the virtualization boundary. That interface is structurally narrower, but it is not inherently safe. It includes hypercalls in Xen-based systems, and ioctl- and virtio-mediated paths in KVM-based systems, along with interrupt delivery and memory management operations. These paths have had real, exploitable vulnerabilities in practice across both Xen and KVM, and remain part of the attack surface.
From a systems perspective, this shifts the isolation boundary from within a kernel to between kernels, while reducing how directly workloads interact with the host kernel. Beyond isolation, this model also enables declarative kernel management — workloads can declare and carry the exact kernel they require. This alleviates a separate class of pressure that unpatched host kernels create: ABI mismatches and behavioral drift between what applications expect and what the host kernel provides.
The Hypervisor Becomes the Enforced Boundary
The isolation boundary in this model is enforced by the hypervisor or virtual machine monitor. This is a shared component, and it is part of the trusted computing base. The difference is not that the shared dependency disappears, but that it changes form.
A general-purpose Linux kernel exposes a very large and evolving syscall and subsystem surface to untrusted workloads. A virtualization layer exposes a more constrained and structured interface centered around memory mapping, CPU scheduling, interrupt delivery, and device mediation. This interface is smaller in shape, but its components run with elevated privilege and have historically contained exploitable bugs.
In Xen-based systems, this boundary is enforced through hypercalls and domain isolation mechanisms. The Xen hypervisor proper has a relatively constrained interface compared to a full Linux kernel, but Xen’s dom0 model introduces an additional consideration. dom0 is technically a virtual machine — a ring0 VM — but one that is typically a privileged Linux kernel with broad access to hardware and control over device backends. If dom0 is itself stale or unpatched, it remains a high-value component in the trusted computing base. The isolation properties of Xen therefore depend not just on the hypervisor, but also on how dom0 is minimized, constrained, or in some designs reduced or eliminated as a general-purpose environment.
In KVM-based systems, the boundary is split between the KVM subsystem in the host kernel and the user-space VMM. The KVM subsystem is part of the host kernel itself, which means vulnerabilities in that subsystem are still reachable through the virtualization interface. On the class of edge systems described earlier, where the host kernel may be old or unpatched, this creates a circular dependency: the mechanism used to reduce kernel exposure still relies on that same kernel for enforcement.
Virtio-based device backends further extend this surface. Guest workloads interact with virtio frontends that ultimately drive host-side backend implementations, either in the kernel or in the VMM. These backends process guest-controlled descriptors and buffers, and have historically been a source of vulnerabilities. The virtualization boundary therefore includes not just hypercalls and VM exits, but also device emulation and mediation logic that must be treated as part of the TCB.
Minimal VMM implementations, including those built on rust-vmm, deliberately constrain the user-space portion of this surface by avoiding large device models and legacy emulation paths. This is not true of all KVM-based deployments. Systems built on full QEMU device models retain significantly larger emulation surfaces and therefore a larger TCB. QEMU is not merely a large device model; it is a userspace program that parses attacker-controlled byte streams into complex state transitions, which substantially increases the practical exploitability of that surface.
The model is therefore not “no shared trust.” It is “shared trust in a smaller, more controlled component,” with the explicit assumption that this component is actively maintained, auditable, and simpler than the kernel surface it replaces. If the hypervisor, dom0, or VMM is as stale and unmaintained as the host kernel it sits alongside, the argument weakens accordingly.
Fault Containment Becomes a Property, Not an Assumption
The immediate consequence is that kernel-level faults are contained within the scope of a single workload. A vulnerability exploited inside one kernel does not automatically translate into control over other workloads or the host. This does not eliminate vulnerabilities. It changes their impact.
On edge systems where kernels may remain unpatched for extended periods, reducing the blast radius of a compromise is often more practical than attempting to eliminate all possible exploits. Containment becomes a property of the system rather than an assumption about correct behavior.
Decoupling Workload Kernel Lifecycle from Host Evolution
This model also decouples workload evolution from host evolution. The host system can remain stable, even static, while workloads run kernels that are selected, updated, and validated independently. Newer kernels can be used for workloads that require them without requiring a full system upgrade.
This is particularly relevant in environments where reboots are costly, coordination across devices is difficult, or hardware constraints prevent uniform upgrades. The kernel lifecycle becomes a per-workload concern rather than a system-wide constraint.
GPU and Device Passthrough at the Edge: What Isolation Does and Doesn't Cover
The edge is increasingly dominated by workloads that interact with specialized hardware such as GPUs, NICs, and domain-specific accelerators. The software stacks that support these devices are complex and often include proprietary components. In a shared-kernel model, multiple workloads interact with these stacks concurrently, and isolation relies on the correctness of both the kernel and the driver model.
By moving workloads into separate kernels, it becomes possible to reason more clearly about device ownership and interaction boundaries at the kernel and driver interaction layer. Workloads no longer share the same kernel interfaces when interacting with devices, which reduces a class of cross-workload interference and attack paths. This separation also enables workload-level firmware security policies. For example, SecureBoot can be enforced per workload for GPU stacks in Edera's model — a capability that is currently a significant pain point for edge providers relying on shared-kernel deployments. The separation also applies at the driver layer: different workloads can load different device drivers for the same class of hardware without sharing driver state, eliminating a common source of cross-workload interference on nodes running heterogeneous software.
This does not eliminate hardware-level sharing. Mechanisms such as SR-IOV, vGPU stacking, and NVIDIA MIG operate below the virtualization boundary. Isolation at that layer depends on mechanisms such as IOMMU enforcement, which constrains DMA access from devices into system memory. Where IOMMU support is absent, disabled, or misconfigured — which is not uncommon on certain classes of edge hardware and embedded platforms — PCIe passthrough effectively exposes system memory to devices controlled by the guest. In that configuration, passthrough is not just a shared-surface concern but a direct path to host memory. A guest workload with control over a passed-through device can issue DMA transactions that read or write arbitrary physical memory on the host, bypassing all kernel and hypervisor isolation boundaries entirely. This is not a theoretical edge case — it is the default behavior when IOMMU is absent or disabled, and on many embedded and industrial platforms, that is the common configuration.
There is also a practical tension between minimal VMM designs and device-heavy workloads. Minimal VMMs reduce attack surface by limiting device emulation, but GPU workloads often require PCIe passthrough, mediated devices, or vendor-specific virtualization stacks. In practice, supporting GPUs involves selectively expanding the device surface or relying on hardware-assisted partitioning, which reintroduces shared components below the kernel boundary. This expansion is not benign: GPU stacks frequently rely on QEMU or similarly complex VMMs for device mediation, reintroducing a large, parser-heavy attack surface into the trusted computing base at precisely the point where hardware access is most privileged.
A workload can still be affected by another workload’s interaction with the same physical device, and vulnerabilities in device firmware are not mitigated by kernel isolation. The improvement is therefore specific. Isolation is strengthened at the software boundary where workloads interact with kernels and drivers. Hardware-level isolation remains a separate problem with its own constraints and mitigation strategies.
Firmware and Microarchitectural Boundaries
The isolation model described here operates at the level of kernels and hypervisors. It does not address shared components below that layer. CPU microarchitecture, speculative execution behavior, and firmware remain shared across workloads regardless of virtualization boundaries.
Classes of vulnerabilities such as Spectre, Meltdown, and their descendants operate across isolation boundaries by exploiting shared microarchitectural state. Mitigations for these issues depend on microcode updates, CPU features, and software-level mitigations such as IBRS, IBPB, and scheduling strategies that limit cross-domain leakage. In edge environments, where host kernels and firmware are often not updated frequently, these mitigations may be absent or incomplete.
This does not invalidate the isolation model, but it defines its limits. Moving the isolation boundary to the hypervisor reduces kernel-mediated attack paths, but does not eliminate risks that exist below that boundary or in shared execution resources such as CPU cores and caches.
Guest Kernel Trust and Scope
Each workload in this model carries its own kernel, which defines its execution environment. This does not imply that the guest kernel itself is inherently trusted or up to date. A workload can still run a vulnerable or poorly configured kernel within its own boundary.
The model addresses cross-workload isolation and blast radius, not the internal security posture of a given workload. Responsibility for selecting, maintaining, and validating the guest kernel remains with the system operator or workload owner. Without mechanisms such as measured boot or attestation, there is no intrinsic guarantee that the kernel running in a workload matches the intended or validated configuration, particularly in environments where physical access to devices is possible.
The Kernel Is Not a Boundary at the Edge
The more accurate view is that the kernel is part of the attack surface and that its correctness cannot be assumed. Once you accept that, the question is no longer how to better constrain workloads within that kernel. The question becomes how to prevent that kernel from being the shared point of failure across all workloads on the system.
Running each workload with its own per-workload kernel, enforced by a smaller and more controlled virtualization boundary, is one answer to that question. It does not remove shared components, but it reduces their scope and exposure. It does not eliminate vulnerabilities, but it changes how far they propagate.
At the edge, the problem is not that resources are limited. The problem is that the foundation is unstable and difficult to change. Designing isolation around that constraint is not an optimization. It is a requirement.
.jpg)
-3.avif)