Dask-CUDA

Dask-CUDA is a library extending Dask.distributed’s single-machine LocalCluster and Worker for use in distributed GPU workloads. It is a part of the RAPIDS suite of open-source software libraries for GPU-accelerated data science.

Motivation

While Distributed can be used to leverage GPU workloads through libraries such as cuDF, CuPy, and Numba, Dask-CUDA offers several unique features unavailable to Distributed:

Automatic instantiation of per-GPU workers – Using Dask-CUDA’s LocalCUDACluster or dask cuda worker CLI will automatically launch one worker for each GPU available on the executing node, avoiding the need to explicitly select GPUs.
Automatic setting of CPU affinity – The setting of CPU affinity for each GPU is done automatically, preventing memory transfers from taking suboptimal paths.
Automatic selection of InfiniBand devices – When UCX communication is enabled over InfiniBand, Dask-CUDA automatically selects the optimal InfiniBand device for each GPU (see UCX Integration for instructions on configuring UCX communication).
Memory spilling from GPU – For memory-intensive workloads, Dask-CUDA supports spilling from GPU to host memory when a GPU reaches the default or user-specified memory utilization limit.
Allocation of GPU memory – when using UCX communication, per-GPU memory pools can be allocated using RAPIDS Memory Manager to circumvent the costly memory buffer mappings that would be required otherwise.

Contents

Getting Started

Additional Features

Examples