Dask-CUDA is a library extending LocalCluster from dask.distributed to enable multi-GPU workloads.


You can use LocalCUDACluster to create a cluster of one or more GPUs on your local machine. You can launch a Dask scheduler on LocalCUDACluster to parallelize and distribute your RAPIDS workflows across multiple GPUs on a single node.

In addition to enabling multi-GPU computation, LocalCUDACluster also provides a simple interface for managing the cluster, such as starting and stopping the cluster, querying the status of the nodes, and monitoring the workload distribution.


Before running these instructions, ensure you have installed the dask and dask-cuda packages in your local environment

Cluster setup#

Instantiate a LocalCUDACluster object#

The LocalCUDACluster class autodetects the GPUs in your system, so if you create it on a machine with two GPUs it will create a cluster with two workers, each of which is responsible for executing tasks on a separate GPU.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()

You can also restrict your cluster to use specific GPUs by setting the CUDA_VISIBLE_DEVICES environment variable, or as a keyword argument.

cluster = LocalCUDACluster(
)  # Creates one worker for GPUs 0 and 1

Connecting a Dask client#

The Dask scheduler coordinates the execution of tasks, whereas the Dask client is the user-facing interface that submits tasks to the scheduler and monitors their progress.

client = Client(cluster)


To test RAPIDS, create a distributed client for the cluster and query for the GPU model.

def get_gpu_model():
    import pynvml

    return pynvml.nvmlDeviceGetName(pynvml.nvmlDeviceGetHandleByIndex(0))

result = client.submit(get_gpu_model).result()

# b'Tesla V100-SXM2-16GB