Reduce Cloud Spend
with Rightsizing MicroVMs

Pay for what you use. Not what you think you’ll need.
What Makes DevZero Unique
Kubernetes-native MicroVMs
MicroVMs combine the security and isolation of traditional VMs with the build tooling and snappiness of containers for any compute, memory, and storage. Workloads on DevZero use a K8s runtime class that leverages a Type 2 hypervisor to spawn microVMs, within which containerized workloads are executed.
Multi-dimensional Pod Autoscaler
Workloads (technically, pods) are scaled in place based on their resource consumption. When needed, workloads are dynamically live-migrated to more suitable nodes without requiring restarts. The MPA operator manages cluster autoscaling, selecting optimal instances to maintain efficient cluster configurations while preserving service-level SLOs.
How It Works
MicroVMs + Type 2 Hypervisor
Kata containers are used to spawn workloads using cloud hypervisors. This is exposed as a Kubernetes runtime class to ensure existing OCI images work without requiring any rebuilds. Every workload (pod → container) is wrapped in a microVM, which has its own guest OS and kernel that is not shared with any other workloads on the same host.
  • Hypervisor-level isolation → Prevents container escapes and allows developers to run privileged containers without risking the underlying infra.
  • Minimal overhead → Lightweight guest OS in a microVM with sub-200 msec startup time.
  • Kubernetes-compatible → Works seamlessly with existing Kubernetes infrastructure and configurations.
Live Migration
Containerized workloads are snapshotted at the system level to preserve process memory and state. Workloads are snapshotted using CRIU and restored onto optimal nodes. TCP connections are preserved. In certain cases, workloads are streamed to the target node.
  • Zero downtime irrespective of Kubernetes runtime → Upgrade, drain, or rotate nodes without stopping workloads.
  • Load balancing → Redistribute workloads automatically based on demand.
  • Session persistence → Maintain active connections and session data.
Multi-dimensional Pod Autoscaling (MPA)
The autoscaler monitors resource utilization across the cluster. Based on usage, it adjusts the number of replicas and the resources allocated to each workload (by resizing the container sandbox in place, without restarting workloads). It also changes instance types to optimize costs while maintaining SLOs.
  • Sandbox RAM resizing → Uses free-page reporting to release unused memory, backed by a balloon device.
  • MPA → Combines the responsibilities of the Vertical Pod Autoscaler, Horizontal Pod Autoscaler, and Cluster Autoscaler.
  • Fewer evictions → Swaps cold pages to prevent unexpected shutdowns.
Resource Adjustment System (DRAS)
Dynamically adjusts compute and memory resources to match workload demand. DRAS continuously monitors usage and optimizes allocations to prevent wasted resources and performance bottlenecks.
  • Auto-scaling → Allocates resources based on workload demand.
  • Hybrid manual/auto mode → Set scaling policies with flexibility to override.
  • Better stability → Balances workloads to prevent bottlenecks and slowdowns.
Accelerated Compute & GPU Multi-Tenancy
Support high-performance workloads with secure, isolated GPU and DPU acceleration. MicroVMs, combined with NVIDIA BlueField-3 DPUs and Multi-Instance GPU (MIG) technology, enable efficient multi-tenancy without sacrificing performance or security.
  • DPU isolation → Offloads networking, security, and storage from the host.
  • MIG for AI → Splits GPUs into smaller, isolated instances.
  • Seamless integration → Resources work without extra configuration.
Reduce Your Cloud Spend with Live Rightsizing MicroVMs
Run workloads in secure, right-sized microVMs with built-in observability and dynamic scaling. Just a single operator and you are on the path to reducing cloud spend.
Get full visibility and pay only for what you use.