Quickstart
A step-by-step guide on installing the zxporter (read-only) operator into your cluster.
Connect your Kubernetes Cluster
You can connect your Kubernetes cluster to the DevZero platform by deploying the zxporter operator. This lightweight, read-only component powers real-time cost insights and optimization recommendations — without modifying your workloads.
Log into the DevZero Console
After logging into the DevZero Console, click the "Connect new cluster" button in the "Clusters" section to begin the setup process.
Your K8s Provider
Choose the environment where your Kubernetes cluster is running. DevZero supports:
- Amazon EKS
- Google GKE
- Microsoft AKS
- Oracle OKE
- Other (self-managed or on-prem clusters)
After selecting your provider, copy the install command.
Install the operator
You’ll be provided a one-line script to deploy zxporter. Copy and run this script in a terminal with access to your Kubernetes cluster and kubectl configured.
Why not Helm? We have a Helm chart (we promise). But the quickstart is about getting to cost insights in minutes, not configuring values.yaml. The Helm chart will be waiting when you’re ready for production.
📘 Note: zxporter is fully read-only. It does not access secrets or modify cluster resources. You can inspect the manifest before applying it for full transparency.
Validating the connection
Once installed, DevZero will automatically detect and connect your cluster. Within a few minutes, you’ll start receiving real-time cost insights and workload optimization suggestions.
View dashboard
You’re now ready to explore the DevZero platform and improve your cluster’s efficiency.
Install via Helm (Recommended)
For production deployments, we recommend installing the Read Operator via Helm. The Helm chart installs both the zxporter controller and the nodemon DaemonSet (which collects node, container, and GPU metrics) in a single command.
Get the Helm command from the dashboard
- Log into the DevZero Dashboard
- Go to Clusters and select your cluster (or click Connect new cluster)
- Choose Helm as the install method
- Copy the pre-filled Helm command — it already includes your cluster token, provider, and DAKR URL
Run the command in a terminal with kubectl and helm configured for your cluster.
Verify the installation
# Check zxporter controller is running
kubectl get pods -n devzero-system -l control-plane=controller-manager
# Check nodemon is running on every node (one pod per node)
kubectl get pods -n devzero-system -l app.kubernetes.io/name=zxporter-nodemon -o wideYou should see the controller manager pods as 1/1 Running and one nodemon pod per node as 2/2 Running.
Collect GPU Metrics
The zxporter Helm chart includes a nodemon DaemonSet that runs on every node. On GPU nodes, nodemon automatically collects GPU metrics via NVIDIA DCGM and enriches them with Kubernetes context (namespace, pod, container).
If your cluster has no GPU nodes, GPU metrics collection is automatically skipped — no extra configuration needed.
Check GPU nodes are visible
Confirm that GPU nodes have allocatable GPU capacity:
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' \
| grep -v $'\t$'If no nodes appear, the NVIDIA GPU Operator or NVIDIA k8s-device-plugin is not installed. Install one before continuing.
Check if a DCGM exporter already exists
Run this to see if your cluster already has a DCGM exporter running:
kubectl get daemonset -A | grep -i dcgmIf you already have a DCGM exporter (e.g. from GKE managed DCGM or NVIDIA GPU Operator), tell nodemon to use it instead of deploying its own.
Step 1: Find the labels on your DCGM pods:
DCGM_DS=$(kubectl get daemonset -A -o json \
| jq -r '.items[] | select(.metadata.name | contains("dcgm")) | "\(.metadata.namespace)/\(.metadata.name)"' \
| head -1)
echo "Found: $DCGM_DS"
kubectl get daemonset -n ${DCGM_DS%/*} ${DCGM_DS#*/} -o json \
| jq -r '.spec.template.metadata.labels | to_entries[] | "\(.key)=\(.value)"'Common labels by provider:
| Cloud | Typical DCGM pod label |
|---|---|
| GCP (GKE managed) | app.kubernetes.io/name=gke-managed-dcgm-exporter |
| EKS / GPU Operator | app=nvidia-dcgm-exporter |
| Azure / GPU Operator | app=nvidia-dcgm-exporter |
Step 2: Upgrade zxporter with two extra flags.
Take the Helm command from your DevZero Dashboard and add these two flags at the end:
--set zxporter-nodemon.dcgmExporter.enabled=false \
--set zxporter-nodemon.nodemon.config.DCGM_LABELS="<YOUR_DCGM_LABEL>"Replace <YOUR_DCGM_LABEL> with the label you found above (e.g. app=nvidia-dcgm-exporter).
These flags tell nodemon to skip deploying its own DCGM sidecar and instead discover your existing DCGM pods by their label.
No extra configuration needed. The Helm command from the DevZero Dashboard already includes everything — nodemon deploys its own DCGM exporter as a sidecar container by default.
The DCGM sidecar runs on every node but idles gracefully on non-GPU nodes — it won't crashloop or waste resources.
Verify GPU metrics are flowing
Wait for pods to be ready, then check:
# Check nodemon pods are running (one per node, 2/2 = nodemon + DCGM sidecar)
kubectl get pods -n devzero-system -l app.kubernetes.io/name=zxporter-nodemon
# Check GPU metrics from a nodemon pod on a GPU node
NODEMON_IP=$(kubectl get pods -n devzero-system -l app.kubernetes.io/name=zxporter-nodemon \
-o jsonpath='{.items[0].status.podIP}')
kubectl run gpu-check --rm -i --restart=Never --image=curlimages/curl -n devzero-system \
-- curl -s "http://$NODEMON_IP:6061/gpu/metrics" | head -c 500You should see JSON with gpu_utilization, framebuffer_used, temperature, etc.
View GPU workloads on the dashboard
Go to the DevZero Dashboard and check your GPU workloads. GPU metrics appear under the workload detail view within a few minutes.
GPU Metrics Collected
| Metric | Description |
|---|---|
gpu_utilization | GPU compute utilization (%) |
temperature | GPU temperature (°C) |
memory_temperature | Memory temperature (°C) |
power_usage | Power draw (W) |
framebuffer_used | GPU memory used (MiB) |
framebuffer_free | GPU memory free (MiB) |
framebuffer_total | Total GPU memory (MiB) |
mem_copy_util | Memory copy engine utilization (%) |
sm_clock | SM clock frequency (MHz) |
mem_clock | Memory clock frequency (MHz) |
xid_errors | XID error count |
power_violation | Power throttle time (ns) |
thermal_violation | Thermal throttle time (ns) |