Raw New Markdown
Generating updated version of doc...
Rendered New Markdown
Generating updated version of doc...
---
title: Deploy Kubernetes workload using GPU sharing on Azure Stack Edge Pro GPU device
description: Describes how you can deploy a GPU shared workload via Kubernetes on your Azure Stack Edge Pro GPU device.
services: databox
author: alkohli
ms.service: azure-stack-edge
ms.topic: how-to
ms.date: 03/12/2021
ms.author: alkohli
---
# Deploy a Kubernetes workload using GPU sharing on your Azure Stack Edge Pro
This article describes how containerized workloads can share the GPUs on your Azure Stack Edge Pro GPU device. In this article, you will run two jobs, one without the GPU context-sharing and one with the context-sharing enabled via the Multi-Process Service (MPS) on the device. For more information, see the [Multi-Process Service](https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf).
## Prerequisites
Before you begin, make sure that:
1. You've access to an Azure Stack Edge Pro GPU device that is [activated](azure-stack-edge-gpu-deploy-activate.md) and has [compute configured ](azure-stack-edge-gpu-deploy-configure-compute.md). You have the [Kubernetes API endpoint](azure-stack-edge-gpu-deploy-configure-compute.md#get-kubernetes-endpoints) and you have added this endpoint to the `hosts` file on your client that will be accessing the device.
1. You've access to a client system with a [Supported operating system](azure-stack-edge-gpu-system-requirements.md#supported-os-for-clients-connected-to-device). If using a Windows client, the system should run PowerShell 5.0 or later to access the device.
1. You have created a namespace and a user. You have also granted user the access to this namespace. You have the kubeconfig file of this namespace installed on the client system that you'll use to access your device. For detailed instructions, see [Connect to and manage a Kubernetes cluster via kubectl on your Azure Stack Edge Pro GPU device](azure-stack-edge-gpu-create-kubernetes-cluster.md#configure-cluster-access-via-kubernetes-rbac).
1. Save the following deployment `yaml` on your local system. You'll use this file to run Kubernetes deployment. This deployment is based on [Simple CUDA containers](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#running-simple-containers) that are publicly available from NVIDIA.
```yml
apiVersion: batch/v1
kind: Job
metadata:
name: cuda-sample1
spec:
template:
spec:
hostPID: true
hostIPC: true
containers:
- name: cuda-sample-container1
image: nvidia/samples:nbody
command: ["/tmp/nbody"]
args: ["-benchmark", "-i=1000"]
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "0"
restartPolicy: "Never"
backoffLimit: 1
---
apiVersion: batch/v1
kind: Job
metadata:
name: cuda-sample2
spec:
template:
metadata:
spec:
hostPID: true
hostIPC: true
containers:
- name: cuda-sample-container2
image: nvidia/samples:nbody
command: ["/tmp/nbody"]
args: ["-benchmark", "-i=1000"]
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "0"
restartPolicy: "Never"
backoffLimit: 1
```
## Verify GPU driver, CUDA version
The first step is to verify that your device is running required GPU driver and CUDA versions.
1. [Connect to the PowerShell interface of your device](azure-stack-edge-gpu-connect-powershell-interface.md#connect-to-the-powershell-interface).
1. Run the following command:
```powershell
Get-HcsGpuNvidiaSmi
```
1. In the NVIDIA smi output, make a note of the GPU version and the CUDA version on your device. If you are running Azure Stack Edge 2102 software, this version would correspond to the following driver versions:
- GPU driver version: 460.32.03
- CUDA version: 11.2
Here is an example output:
```powershell
[10.100.10.10]: PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:
Wed Mar 3 12:24:27 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00002C74:00:00.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[10.100.10.10]: PS>
```
1. Keep this session open as you will use it to view the NVIDIA smi output throughout the article.
## Job without context-sharing
You'll run the first job to deploy an application on your device in the namespace `mynamesp1`. This application deployment will also show that the GPU context-sharing is not enabled by default.
1. List all the pods running in the namespace. Run the following command:
```powershell
kubectl get pods -n <Name of the namespace>
```
Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
No resources found.
```
1. Start a deployment job on your device using the deployment.yaml provided earlier. Run the following command:
```powershell
kubectl apply -f <Path to the deployment .yaml> -n <Name of the namespace>
```
This job creates two containers and runs an n-body simulation on both the containers. The number of simulation iterations are specified in the `.yaml`.
Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl apply -f -n mynamesp1 C:\gpu-sharing\k8-gpusharing.yaml
job.batch/cuda-sample1 created
job.batch/cuda-sample2 created
PS C:\WINDOWS\system32>
```
1. To list the pods started in the deployment, run the following command:
```powershell
kubectl get pods -n <Name of the namespace>
```
Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME READY STATUS RESTARTS AGE
cuda-sample1-27srm 1/1 Running 0 28s
cuda-sample2-db9vx 1/1 Running 0 27s
PS C:\WINDOWS\system32>
```
There are two pods, `cuda-sample1-cf979886d-xcwsq` and `cuda-sample2-68b4899948-vcv68` running on your device.
1. Fetch the details of the pods. Run the following command:
```powershell
kubectl -n <Name of the namespace> describe <Name of the job>
```
Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl -n mynamesp1 describe job.batch/cuda-sample1; kubectl -n mynamesp1 describe job.batch/cuda-sample2
Name: cuda-sample1
Namespace: mynamesp1
Selector: controller-uid=22783f76-6af1-490d-b6eb-67dd4cda0e1f
Labels: controller-uid=22783f76-6af1-490d-b6eb-67dd4cda0e1f
job-name=cuda-sample1
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample1","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism: 1
Completions: 1
Start Time: Wed, 03 Mar 2021 12:25:34 -0800
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=22783f76-6af1-490d-b6eb-67dd4cda0e1f
job-name=cuda-sample1
Containers:
cuda-sample-container1:
Image: nvidia/samples:nbody
Port: <none>
Host Port: <none>
Command:
/tmp/nbody
Args:
-benchmark
-i=10000
Environment:
NVIDIA_VISIBLE_DEVICES: 0
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 60s job-controller Created pod: cuda-sample1-27srm
Name: cuda-sample2
Namespace: mynamesp1
Selector: controller-uid=e68c8d5a-718e-4880-b53f-26458dc24381
Labels: controller-uid=e68c8d5a-718e-4880-b53f-26458dc24381
job-name=cuda-sample2
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample2","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism: 1
Completions: 1
Start Time: Wed, 03 Mar 2021 12:25:35 -0800
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=e68c8d5a-718e-4880-b53f-26458dc24381
job-name=cuda-sample2
Containers:
cuda-sample-container2:
Image: nvidia/samples:nbody
Port: <none>
Host Port: <none>
Command:
/tmp/nbody
Args:
-benchmark
-i=10000
Environment:
NVIDIA_VISIBLE_DEVICES: 0
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 60s job-controller Created pod: cuda-sample2-db9vx
PS C:\WINDOWS\system32>
```
The output indicates that both the pods were successfully created by the job.
1. While both the containers are running the n-body simulation, view the GPU utilization from the NVIDIA smi output. Go to the PowerShell interface of the device and run `Get-HcsGpuNvidiaSmi`.
Here is an example output when both the containers are running the n-body simulation:
```powershell
[10.100.10.10]: PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:
Wed Mar 3 12:26:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00002C74:00:00.0 Off | 0 |
| N/A 64C P0 69W / 70W | 221MiB / 15109MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 197976 C /tmp/nbody 109MiB |
| 0 N/A N/A 198051 C /tmp/nbody 109MiB |
+-----------------------------------------------------------------------------+
[10.100.10.10]: PS>
```
As you can see, there are two containers (Type = C) running with n-body simulation on GPU 0.
1. Monitor the n-body simulation. Run the `get pod` commands. Here is an example output when the simulation is running.
```powershell
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME READY STATUS RESTARTS AGE
cuda-sample1-27srm 1/1 Running 0 70s
cuda-sample2-db9vx 1/1 Running 0 69s
PS C:\WINDOWS\system32>
```
When the simulation is complete, the output will indicate that. Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME READY STATUS RESTARTS AGE
cuda-sample1-27srm 0/1 Completed 0 2m54s
cuda-sample2-db9vx 0/1 Completed 0 2m53s
PS C:\WINDOWS\system32>
```
1. After the simulation is complete, you can view the logs and the total time for the completion of the simulation. Run the following command:
```powershell
kubectl logs -n <Name of the namespace> <pod name>
```
Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample1-27srm
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
===========// CUT //===================// CUT //=====================
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5
> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10000 iterations: 170398.766 ms
= 98.459 billion interactions per second
= 1969.171 single-precision GFLOP/s at 20 flops per interaction
PS C:\WINDOWS\system32>
```
```powershell
PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample2-db9vx
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
===========// CUT //===================// CUT //=====================
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5
> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10000 iterations: 170368.859 ms
= 98.476 billion interactions per second
= 1969.517 single-precision GFLOP/s at 20 flops per interaction
PS C:\WINDOWS\system32>
```
1. There should be no processes running on the GPU at this time. You can verify this by viewing the GPU utilization using the NVIDIA smi output.
```powershell
[10.100.10.10]: PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:
Wed Mar 3 12:32:52 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00002C74:00:00.0 Off | 0 |
| N/A 38C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[10.100.10.10]: PS>
```
## Job with context-sharing
You'll run the second job to deploy the n-body simulation on two CUDA containers when GPU context-sharing is enabled though the MPS. First, you'll enable MPS on the device.
1. [Connect to the PowerShell interface of your device](azure-stack-edge-gpu-connect-powershell-interface.md).
1. To enable MPS on your device, run the `Start-HcsGpuMPS` command.
```powershell
[10.100.10.10]: PS>Start-HcsGpuMPS
K8S-1HXQG13CL-1HXQG13:
Set compute mode to EXCLUSIVE_PROCESS for GPU 00002C74:00:00.0.
All done.
Created nvidia-mps.service
[10.100.10.10]: PS>
```
1. Run the job using the same deployment `yaml` you used earlier. You may need to delete the existing deployment. See [Delete deployment](#delete-deployment).
Here is an example output:
```yml
PS C:\WINDOWS\system32> kubectl -n mynamesp1 delete -f C:\gpu-sharing\k8-gpusharing.yaml
job.batch "cuda-sample1" deleted
job.batch "cuda-sample2" deleted
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
No resources found.
PS C:\WINDOWS\system32> kubectl -n mynamesp1 apply -f C:\gpu-sharing\k8-gpusharing.yaml
job.batch/cuda-sample1 created
job.batch/cuda-sample2 created
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME READY STATUS RESTARTS AGE
cuda-sample1-vcznt 1/1 Running 0 21s
cuda-sample2-zkx4w 1/1 Running 0 21s
PS C:\WINDOWS\system32> kubectl -n mynamesp1 describe job.batch/cuda-sample1; kubectl -n mynamesp1 describe job.batch/cuda-sample2
Name: cuda-sample1
Namespace: mynamesp1
Selector: controller-uid=ed06bdf0-a282-4b35-a2a0-c0d36303a35e
Labels: controller-uid=ed06bdf0-a282-4b35-a2a0-c0d36303a35e
job-name=cuda-sample1
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample1","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism: 1
Completions: 1
Start Time: Wed, 03 Mar 2021 21:51:51 -0800
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=ed06bdf0-a282-4b35-a2a0-c0d36303a35e
job-name=cuda-sample1
Containers:
cuda-sample-container1:
Image: nvidia/samples:nbody
Port: <none>
Host Port: <none>
Command:
/tmp/nbody
Args:
-benchmark
-i=10000
Environment:
NVIDIA_VISIBLE_DEVICES: 0
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 46s job-controller Created pod: cuda-sample1-vcznt
Name: cuda-sample2
Namespace: mynamesp1
Selector: controller-uid=6282b8fa-e76d-4f45-aa85-653ee0212b29
Labels: controller-uid=6282b8fa-e76d-4f45-aa85-653ee0212b29
job-name=cuda-sample2
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample2","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism: 1
Completions: 1
Start Time: Wed, 03 Mar 2021 21:51:51 -0800
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=6282b8fa-e76d-4f45-aa85-653ee0212b29
job-name=cuda-sample2
Containers:
cuda-sample-container2:
Image: nvidia/samples:nbody
Port: <none>
Host Port: <none>
Command:
/tmp/nbody
Args:
-benchmark
-i=10000
Environment:
NVIDIA_VISIBLE_DEVICES: 0
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 47s job-controller Created pod: cuda-sample2-zkx4w
PS C:\WINDOWS\system32>
```
1. While the simulation is running, you can view the NVIDIA smi output. The output shows processes corresponding to the cuda containers (M + C type) with n-body simulation and the MPS service (C type) as running. All these processes share GPU 0.
```powershell
PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:
Mon Mar 3 21:54:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 0000E00B:00:00.0 Off | 0 |
| N/A 45C P0 68W / 70W | 242MiB / 15109MiB | 100% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 144377 M+C /tmp/nbody 107MiB |
| 0 N/A N/A 144379 M+C /tmp/nbody 107MiB |
| 0 N/A N/A 144443 C nvidia-cuda-mps-server 25MiB |
+-----------------------------------------------------------------------------+
```
1. After the simulation is complete, you can view the logs and the total time for the completion of the simulation. Run the following command:
```powershell
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME READY STATUS RESTARTS AGE
cuda-sample1-vcznt 0/1 Completed 0 5m44s
cuda-sample2-zkx4w 0/1 Completed 0 5m44s
PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample1-vcznt
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
===========// CUT //===================// CUT //=====================
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5
> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10000 iterations: 154979.453 ms
= 108.254 billion interactions per second
= 2165.089 single-precision GFLOP/s at 20 flops per interaction
PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample2-zkx4w
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
===========// CUT //===================// CUT //=====================
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5
> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10000 iterations: 154986.734 ms
= 108.249 billion interactions per second
= 2164.987 single-precision GFLOP/s at 20 flops per interaction
PS C:\WINDOWS\system32>
```
1. After the simulation is complete, you can view the NVIDIA smi output again. Only the nvidia-cuda-mps-server process for the MPS service shows as running.
```powershell
PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:
Mon Mar 3 21:59:55 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 0000E00B:00:00.0 Off | 0 |
| N/A 37C P8 9W / 70W | 28MiB / 15109MiB | 0% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 144443 C nvidia-cuda-mps-server 25MiB |
+-----------------------------------------------------------------------------+
```
## Delete deployment
You may need to delete deployments when running with MPS enabled and with MPS disable on your device.
To delete the deployment on your device, run the following command:
```powershell
kubectl delete -f <Path to the deployment .yaml> -n <Name of the namespace>
```
Here is an example output:
```powershell
PS C:\WINDOWS\system32> kubectl delete -f 'C:\gpu-sharing\k8-gpusharing.yaml' -n mynamesp1
deployment.apps "cuda-sample1" deleted
deployment.apps "cuda-sample2" deleted
PS C:\WINDOWS\system32>
```
## Next steps
- [Deploy an IoT Edge workload with GPU sharing on your Azure Stack Edge Pro](azure-stack-edge-gpu-deploy-iot-edge-gpu-sharing.md).