A Kubernetes mutating webhook that converts GPU device resources to Dynamic Resource Allocation (DRA) ResourceClaims.
This webhook automatically transforms Pod specifications that request GPU resources (e.g., nvidia.com/gpu) into DRA ResourceClaims, enabling dynamic resource allocation for GPU workloads in Kubernetes.
- Automatic Resource Conversion: Converts GPU resource requests to ResourceClaims
- Resource Cleanup: Automatically removes GPU resources from Pod specs and creates corresponding ResourceClaims
- Annotation Support: Supports device selection via Pod annotations (UUID, device type)
- Metrics Monitoring: Optional monitor component that collects and exposes GPU resource metrics via Prometheus
- Kubernetes version >= 1.34 with DRA Consumable Capacity featuregate enabled
- CDI must be enabled in the underlying container runtime (such as containerd or CRI-O).
- NVIDIA GPU Driver 440 or later
You need to ensure cert-manager is installed before installing the webhook.
Then you can install the webhook with the following command:
helm install hami-dra ./charts/hami-draIf you are not using gpu-operator provided containerd drivers, you can use the following command to install the webhook:
helm install hami-dra ./charts/hami-dra \
--set drivers.nvidia.containerDriver=falseTo disable the monitor component:
helm install hami-dra ./charts/hami-dra \
--set monitor.enabled=falseThen use the same as hami.
Configure device resources in charts/hami-dra/values.yaml:
resourceName: "nvidia.com/gpu"
resourceMem: "nvidia.com/gpumem"
resourceCores: "nvidia.com/gpucores"The monitor component is an optional feature that collects and exposes GPU resource metrics via Prometheus. It is enabled by default.
Quick Start:
Set the monitor service to NodePort so we can access it outside the cluster:
monitor:
enabled: true
service:
type: NodePort
nodePort:
metrics: 31995Access metrics:
# With NodePort
curl http://<node-ip>:31995/metricsyou will see metrics like:
For detailed configuration, metrics documentation, and Prometheus integration, see MONITOR.md.
