This is a list of common issues encountered with Dask-CUDA and various systems.

Wrong Device Indexing

It’s common to rely on the device indexing presented by nvidia-smi when creating workers, and that is the default in Dask-CUDA. In most cases, nvidia-smi provides a one-to-one mapping with CUDA_VISIBLE_DEVICES, but in some systems the ordering may not match. While, CUDA_VISIBLE_DEVICES indexes GPUs by their PCI Bus ID, nvidia-smi orders by fastest GPUs. Issues are commonly seen in DGX Station A100 that contains 4 A100 GPUs, plus a Display GPU, but the Display GPU may not be the last GPU according to the PCI Bus ID. To correct that and ensure the mapping according to the PCI Bus ID, it’s necessary to set the CUDA_DEVICE_ORDER=PCI_BUS_ID environment variable when starting the Python process:

$ CUDA_DEVICE_ORDER=PCI_BUS_ID dask-cuda-worker ...

For the DGX Station A100, the display GPU is commonly the fourth in the PCI Bus ID ordering, thus one needs to use GPUs 0, 1, 2 and 4 for Dask-CUDA:

>>> from dask_cuda import LocalCUDACluster
>>> cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=[0, 1, 2, 4])