Dask-CUDA is a library extending Dask.distributed’s single-machine LocalCluster and Worker for use in distributed GPU workloads. It is a part of the RAPIDS suite of open-source software libraries for GPU-accelerated data science.


While Distributed can be used to leverage GPU workloads through libraries such as cuDF, CuPy, and Numba, Dask-CUDA offers several unique features unavailable to Distributed:

  • Automatic instantiation of per-GPU workers – Using Dask-CUDA’s LocalCUDACluster or dask cuda worker CLI will automatically launch one worker for each GPU available on the executing node, avoiding the need to explicitly select GPUs.

  • Automatic setting of CPU affinity – The setting of CPU affinity for each GPU is done automatically, preventing memory transfers from taking suboptimal paths.

  • Automatic selection of InfiniBand devices – When UCX communication is enabled over InfiniBand, Dask-CUDA automatically selects the optimal InfiniBand device for each GPU (see UCX Integration for instructions on configuring UCX communication).

  • Memory spilling from GPU – For memory-intensive workloads, Dask-CUDA supports spilling from GPU to host memory when a GPU reaches the default or user-specified memory utilization limit.

  • Allocation of GPU memory – when using UCX communication, per-GPU memory pools can be allocated using RAPIDS Memory Manager to circumvent the costly memory buffer mappings that would be required otherwise.