Enable the GPU time slicing feature
The NVIDIA device plugin supports GPU oversubscription using either Time-Slicing or MPS, though these methods are mutually exclusive. Time-Slicing utilizes CUDA time-slicing to enable concurrent workloads on a single GPU, sharing memory and running within the same fault domain – meaning a crash in one workload affects all others. Conversely, MPS employs a control daemon to manage access and enforces space partitioning, allowing explicit allocation of memory and compute resources to each workload, ensuring isolation and preventing resource contention. Importantly, both Time-Slicing and MPS apply the same sharing configuration across all GPUs on a node; per-GPU customization is not supported.
Prerequisites:
Time-Slicing
1. Create the config
cat << EOF > /tmp/gpu-time-slicing-config.yaml
version: v1
flags:
migStrategy: "none"
failOnInitError: true
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: false
deviceListStrategy: "envvar"
deviceIDStrategy: "uuid"
sharing:
timeSlicing:
failRequestsGreaterThanOne: true
resources:
- name: nvidia.com/gpu
replicas: 10
EOF
2. Apply the Helm upgrade
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.1 \
--namespace nvidia-device-plugin \
--create-namespace \
--set gfd.enabled=true \
--set-file config.map.config=/tmp/gpu-time-slicing-config.yaml
3. Verify the change was applied
Multi-Process Service (MPS)
1. Create config
cat << EOF > /tmp/gpu-mps-config.yaml
version: v1
flags:
migStrategy: "none"
failOnInitError: true
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: false
deviceListStrategy: "envvar"
deviceIDStrategy: "uuid"
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 12
EOF
2. Apply Helm chart
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.1 \
--namespace nvidia-device-plugin \
--create-namespace \
--set gfd.enabled=true \
--set-file config.map.config=/tmp/gpu-mps-config.yaml