AWS Elastic Kubernetes Service (EKS)#
RAPIDS can be deployed on AWS via the Elastic Kubernetes Service (EKS).
To run RAPIDS you’ll need a Kubernetes cluster with GPUs available.
Prerequisites#
First you’ll need to have the aws
CLI tool and eksctl
CLI tool installed along with kubectl
, helm
, etc for managing Kubernetes.
Ensure you are logged into the aws
CLI.
$ aws configure
Create the Kubernetes cluster#
Now we can launch a GPU enabled EKS cluster. First launch an EKS cluster with eksctl
.
$ eksctl create cluster rapids \
--version 1.29 \
--nodes 3 \
--node-type=p3.8xlarge \
--timeout=40m \
--ssh-access \
--ssh-public-key <public key ID> \ # Be sure to set your public key ID here
--region us-east-1 \
--zones=us-east-1c,us-east-1b,us-east-1d \
--auto-kubeconfig
With this command, you’ve launched an EKS cluster called rapids
. You’ve specified that it should use nodes of type p3.8xlarge
. We also specified that we don’t want to install the NVIDIA drivers as we will do that with the NVIDIA operator.
To access the cluster we need to pull down the credentials.
$ aws eks --region us-east-1 update-kubeconfig --name rapids
Install drivers#
As we selected a GPU node type EKS will automatically install drivers for us. We can verify this by listing the NVIDIA driver plugin Pods.
$ kubectl get po -n kube-system -l name=nvidia-device-plugin-ds
NAME READY STATUS RESTARTS AGE
nvidia-device-plugin-daemonset-kv7t5 1/1 Running 0 52m
nvidia-device-plugin-daemonset-rhmvx 1/1 Running 0 52m
nvidia-device-plugin-daemonset-thjhc 1/1 Running 0 52m
Note
By default this plugin will install the latest version on the NVIDIA drivers on every Node. If you need more control over your driver installation we recommend that when creating your cluster you set eksctl create cluster --install-nvidia-plugin=false ...
and then install drivers yourself using the NVIDIA GPU Operator.
After you have confirmed your drivers are installed, you are ready to test your cluster.
Let’s create a sample pod that uses some GPU compute to make sure that everything is working as expected.
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vectoradd
image: "nvidia/samples:vectoradd-cuda11.6.0-ubuntu18.04"
resources:
limits:
nvidia.com/gpu: 1
EOF
$ kubectl logs pod/cuda-vectoradd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
If you see Test PASSED
in the output, you can be confident that your Kubernetes cluster has GPU compute set up correctly.
Next, clean up that pod.
$ kubectl delete pod cuda-vectoradd
pod "cuda-vectoradd" deleted
Install RAPIDS#
Now that you have a GPU enabled Kubernetes cluster on EKS you can install RAPIDS with any of the supported methods.
Clean up#
You can also delete the EKS cluster to stop billing with the following command.
$ eksctl delete cluster --region=us-east-1 --name=rapids
Deleting cluster rapids...⠼