Virtual Server for VPC#

Create Instance#

Create a new Virtual Server (for VPC) with GPUs, the NVIDIA Driver and the NVIDIA Container Runtime.

  1. Open the Virtual Server Dashboard.

  2. Select Create.

  3. Give the server a name and select your resource group.

  4. Under Operating System choose Ubuntu Linux.

  5. Under Profile select View all profiles and select a profile with NVIDIA GPUs.

  6. Under SSH Keys choose your SSH key.

  7. Under network settings create a security group (or choose an existing) that allows SSH access on port 22 and also allow ports 8888,8786,8787 to access Jupyter and Dask.

  8. Select Create Virtual Server.

Create floating IP#

To access the virtual server we need to attach a public IP address.

  1. Open Floating IPs

  2. Select Reserve.

  3. Give the Floating IP a name.

  4. Under Resource to bind select the virtual server you just created.

Connect to the instance#

Next we need to connect to the instance.

  1. Open Floating IPs

  2. Locate the IP you just created and note the address.

  3. In your terminal run ssh root@<ip address>

Note

For a short guide on launching your instance and accessing it, read the Getting Started with IBM Virtual Server Documentation.

Install NVIDIA Drivers#

Next we need to install the NVIDIA drivers and container runtime.

  1. Ensure build essentials are installed apt-get update && apt-get install build-essential -y.

  2. Install the NVIDIA drivers.

  3. Install Docker and the NVIDIA Docker runtime.

How do I check everything installed successfully?

You can check everything installed correctly by running nvidia-smi in a container.

$ docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:04:01.0 Off |                    0 |
| N/A   33C    P0    36W / 250W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Install RAPIDS#

There are a selection of methods you can use to install RAPIDS which you can see via the RAPIDS release selector.

For this example we are going to run the RAPIDS Docker container so we need to know the name of the most recent container. On the release selector choose Docker in the Method column.

Then copy the commands shown:

docker pull nvcr.io/nvidia/rapidsai/base:24.06-cuda11.8-py3.10
docker run --gpus all --rm -it \
    --shm-size=1g --ulimit memlock=-1 \
    -p 8888:80 -p 8787:8787 -p 8786:8786 \
    nvcr.io/nvidia/rapidsai/notebooks:24.06-cuda11.8-py3.10

Note

If you see a “docker socket permission denied” error while running these commands try closing and reconnecting your SSH window. This happens because your user was added to the docker group only after you signed in.

Test RAPIDS#

To access Jupyter, navigate to <VM ip>:8888 in the browser.

In a Python notebook, check that you can import and use RAPIDS libraries like cudf.

In [1]: import cudf
In [2]: df = cudf.datasets.timeseries()
In [3]: df.head()
Out[3]:
                       id     name         x         y
timestamp
2000-01-01 00:00:00  1020    Kevin  0.091536  0.664482
2000-01-01 00:00:01   974    Frank  0.683788 -0.467281
2000-01-01 00:00:02  1000  Charlie  0.419740 -0.796866
2000-01-01 00:00:03  1019    Edith  0.488411  0.731661
2000-01-01 00:00:04   998    Quinn  0.651381 -0.525398

Open cudf/10min.ipynb and execute the cells to explore more of how cudf works.

When running a Dask cluster you can also visit <VM ip>:8787 to monitor the Dask cluster status.

Related Examples#

HPO with dask-ml and cuml

dataset/airline library/numpy library/pandas library/xgboost library/dask library/dask-cuda library/dask-ml storage/s3 workflows/hpo library/cuml cloud/aws/ec2 cloud/azure/azure-vm cloud/gcp/compute-engine cloud/ibm/virtual-server library/sklearn