NewCompare CPU & GPU pricing across AWS, Azure & GCP

Quickstart

A step-by-step guide on installing the zxporter (read-only) operator into your cluster.

Connect your Kubernetes Cluster

You can connect your Kubernetes cluster to the DevZero platform by deploying the zxporter operator. This lightweight, read-only component powers real-time cost insights and optimization recommendations — without modifying your workloads.

Log into the DevZero Console

After logging into the DevZero Console, click the "Connect new cluster" button in the "Clusters" section to begin the setup process.

Your K8s Provider

Choose the environment where your Kubernetes cluster is running. DevZero supports:

  • Amazon EKS
  • Google GKE
  • Microsoft AKS
  • Oracle OKE
  • Other (self-managed or on-prem clusters)

After selecting your provider, copy the install command.

Install the operator

You’ll be provided a one-line script to deploy zxporter. Copy and run this script in a terminal with access to your Kubernetes cluster and kubectl configured.

Why not Helm? We have a Helm chart (we promise). But the quickstart is about getting to cost insights in minutes, not configuring values.yaml. The Helm chart will be waiting when you’re ready for production.

📘 Note: zxporter is fully read-only. It does not access secrets or modify cluster resources. You can inspect the manifest before applying it for full transparency.

Validating the connection

Once installed, DevZero will automatically detect and connect your cluster. Within a few minutes, you’ll start receiving real-time cost insights and workload optimization suggestions.

View dashboard

You’re now ready to explore the DevZero platform and improve your cluster’s efficiency.

For production deployments, we recommend installing the Read Operator via Helm. The Helm chart installs both the zxporter controller and the nodemon DaemonSet (which collects node, container, and GPU metrics) in a single command.

Get the Helm command from the dashboard

  1. Log into the DevZero Dashboard
  2. Go to Clusters and select your cluster (or click Connect new cluster)
  3. Choose Helm as the install method
  4. Copy the pre-filled Helm command — it already includes your cluster token, provider, and DAKR URL

Run the command in a terminal with kubectl and helm configured for your cluster.

Verify the installation

# Check zxporter controller is running
kubectl get pods -n devzero-system -l control-plane=controller-manager

# Check nodemon is running on every node (one pod per node)
kubectl get pods -n devzero-system -l app.kubernetes.io/name=zxporter-nodemon -o wide

You should see the controller manager pods as 1/1 Running and one nodemon pod per node as 2/2 Running.

Collect GPU Metrics

The zxporter Helm chart includes a nodemon DaemonSet that runs on every node. On GPU nodes, nodemon automatically collects GPU metrics via NVIDIA DCGM and enriches them with Kubernetes context (namespace, pod, container).

If your cluster has no GPU nodes, GPU metrics collection is automatically skipped — no extra configuration needed.

Check GPU nodes are visible

Confirm that GPU nodes have allocatable GPU capacity:

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' \
  | grep -v $'\t$'

If no nodes appear, the NVIDIA GPU Operator or NVIDIA k8s-device-plugin is not installed. Install one before continuing.

Check if a DCGM exporter already exists

Run this to see if your cluster already has a DCGM exporter running:

kubectl get daemonset -A | grep -i dcgm

If you already have a DCGM exporter (e.g. from GKE managed DCGM or NVIDIA GPU Operator), tell nodemon to use it instead of deploying its own.

Step 1: Find the labels on your DCGM pods:

DCGM_DS=$(kubectl get daemonset -A -o json \
  | jq -r '.items[] | select(.metadata.name | contains("dcgm")) | "\(.metadata.namespace)/\(.metadata.name)"' \
  | head -1)
echo "Found: $DCGM_DS"

kubectl get daemonset -n ${DCGM_DS%/*} ${DCGM_DS#*/} -o json \
  | jq -r '.spec.template.metadata.labels | to_entries[] | "\(.key)=\(.value)"'

Common labels by provider:

CloudTypical DCGM pod label
GCP (GKE managed)app.kubernetes.io/name=gke-managed-dcgm-exporter
EKS / GPU Operatorapp=nvidia-dcgm-exporter
Azure / GPU Operatorapp=nvidia-dcgm-exporter

Step 2: Upgrade zxporter with two extra flags.

Take the Helm command from your DevZero Dashboard and add these two flags at the end:

--set zxporter-nodemon.dcgmExporter.enabled=false \
--set zxporter-nodemon.nodemon.config.DCGM_LABELS="<YOUR_DCGM_LABEL>"

Replace <YOUR_DCGM_LABEL> with the label you found above (e.g. app=nvidia-dcgm-exporter).

These flags tell nodemon to skip deploying its own DCGM sidecar and instead discover your existing DCGM pods by their label.

No extra configuration needed. The Helm command from the DevZero Dashboard already includes everything — nodemon deploys its own DCGM exporter as a sidecar container by default.

The DCGM sidecar runs on every node but idles gracefully on non-GPU nodes — it won't crashloop or waste resources.

Verify GPU metrics are flowing

Wait for pods to be ready, then check:

# Check nodemon pods are running (one per node, 2/2 = nodemon + DCGM sidecar)
kubectl get pods -n devzero-system -l app.kubernetes.io/name=zxporter-nodemon

# Check GPU metrics from a nodemon pod on a GPU node
NODEMON_IP=$(kubectl get pods -n devzero-system -l app.kubernetes.io/name=zxporter-nodemon \
  -o jsonpath='{.items[0].status.podIP}')
kubectl run gpu-check --rm -i --restart=Never --image=curlimages/curl -n devzero-system \
  -- curl -s "http://$NODEMON_IP:6061/gpu/metrics" | head -c 500

You should see JSON with gpu_utilization, framebuffer_used, temperature, etc.

View GPU workloads on the dashboard

Go to the DevZero Dashboard and check your GPU workloads. GPU metrics appear under the workload detail view within a few minutes.

GPU Metrics Collected

MetricDescription
gpu_utilizationGPU compute utilization (%)
temperatureGPU temperature (°C)
memory_temperatureMemory temperature (°C)
power_usagePower draw (W)
framebuffer_usedGPU memory used (MiB)
framebuffer_freeGPU memory free (MiB)
framebuffer_totalTotal GPU memory (MiB)
mem_copy_utilMemory copy engine utilization (%)
sm_clockSM clock frequency (MHz)
mem_clockMemory clock frequency (MHz)
xid_errorsXID error count
power_violationPower throttle time (ns)
thermal_violationThermal throttle time (ns)

On this page