Elastic Compute Cloud (EC2)#

Create Instance#

Create a new EC2 Instance with GPUs, the NVIDIA Driver and the NVIDIA Container Runtime.

NVIDIA maintains an Amazon Machine Image (AMI) that pre-installs NVIDIA drivers and container runtimes, we recommend using this image as the starting point.

  1. Open the EC2 Dashboard.

  2. Select Launch Instance.

  3. In the AMI selection box search for “nvidia”, then switch to the AWS Marketplace AMIs tab.

  4. Select NVIDIA GPU-Optimized AMI and click “Select”. Then, in the new popup, select Subscribe on Instance Launch.

  5. In Key pair select your SSH keys (create these first if you haven’t already).

  6. Under network settings create a security group (or choose an existing) that allows SSH access on port 22 and also allow ports 8888,8786,8787 to access Jupyter and Dask.

  7. Select Launch.

Connect to the instance#

Next we need to connect to the instance.

  1. Open the EC2 Dashboard.

  2. Locate your VM and note the Public IP Address.

  3. In your terminal run ssh ubuntu@<ip address>.

Note

If you use the AWS Console, please use the default ubuntu user to ensure the NVIDIA driver installs on the first boot.

Install RAPIDS#

There are a selection of methods you can use to install RAPIDS which you can see via the RAPIDS release selector.

For this example we are going to run the RAPIDS Docker container so we need to know the name of the most recent container. On the release selector choose Docker in the Method column.

Then copy the commands shown:

docker pull nvcr.io/nvidia/rapidsai/notebooks:25.02-cuda12.8-py3.12
docker run --gpus all --rm -it \
    --shm-size=1g --ulimit memlock=-1 \
    -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    nvcr.io/nvidia/rapidsai/notebooks:25.02-cuda12.8-py3.12

Note

If you see a “docker socket permission denied” error while running these commands try closing and reconnecting your SSH window. This happens because your user was added to the docker group only after you signed in.

Note

If you see a “modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.2.0-1011-aws” while first connecting to the EC2 instance, try logging out and reconnecting again.

Test RAPIDS#

To access Jupyter, navigate to <VM ip>:8888 in the browser.

In a Python notebook, check that you can import and use RAPIDS libraries like cudf.

In [1]: import cudf
In [2]: df = cudf.datasets.timeseries()
In [3]: df.head()
Out[3]:
                       id     name         x         y
timestamp
2000-01-01 00:00:00  1020    Kevin  0.091536  0.664482
2000-01-01 00:00:01   974    Frank  0.683788 -0.467281
2000-01-01 00:00:02  1000  Charlie  0.419740 -0.796866
2000-01-01 00:00:03  1019    Edith  0.488411  0.731661
2000-01-01 00:00:04   998    Quinn  0.651381 -0.525398

Open cudf/10min.ipynb and execute the cells to explore more of how cudf works.

When running a Dask cluster you can also visit <VM ip>:8787 to monitor the Dask cluster status.

Related Examples#

HPO with dask-ml and cuml

airline numpy pandas xgboost dask dask-cuda dask-ml cuml aws/ec2 azure/azure-vm gcp/compute-engine ibm/virtual-server sklearn s3 hpo

HPO with dask-ml and cuml

Measuring Performance with the One Billion Row Challenge

dask-cuda csv cudf cupy dask pandas aws/ec2 aws/sagemaker azure/azure-vm azure/ml gcp/compute-engine gcp/vertex-ai

Measuring Performance with the One Billion Row Challenge

HPO Benchmarking with RAPIDS and Dask

aws/ec2 s3 randomforest hpo xgboost dask dask-cuda xgboost optuna sklearn dask-ml

HPO Benchmarking with RAPIDS and Dask