Dask-CUDA is a library extending Dask.distributed’s single-machine LocalCluster and Worker for use in distributed GPU workloads. It is a part of the RAPIDS suite of open-source software libraries for GPU-accelerated data science.
While Distributed can be used to leverage GPU workloads through libraries such as cuDF, CuPy, and Numba, Dask-CUDA offers several unique features unavailable to Distributed:
Automatic instantiation of per-GPU workers – Using Dask-CUDA’s LocalCUDACluster or
dask cuda workerCLI will automatically launch one worker for each GPU available on the executing node, avoiding the need to explicitly select GPUs.
Automatic setting of CPU affinity – The setting of CPU affinity for each GPU is done automatically, preventing memory transfers from taking suboptimal paths.
Automatic selection of InfiniBand devices – When UCX communication is enabled over InfiniBand, Dask-CUDA automatically selects the optimal InfiniBand device for each GPU (see UCX Integration for instructions on configuring UCX communication).
Memory spilling from GPU – For memory-intensive workloads, Dask-CUDA supports spilling from GPU to host memory when a GPU reaches the default or user-specified memory utilization limit.
Allocation of GPU memory – when using UCX communication, per-GPU memory pools can be allocated using RAPIDS Memory Manager to circumvent the costly memory buffer mappings that would be required otherwise.