Dask-CUDA ========= Dask-CUDA is a library extending `Dask.distributed `_'s single-machine `LocalCluster `_ and `Worker `_ for use in distributed GPU workloads. It is a part of the `RAPIDS `_ suite of open-source software libraries for GPU-accelerated data science. Motivation ---------- While Distributed can be used to leverage GPU workloads through libraries such as `cuDF `_, `CuPy `_, and `Numba `_, Dask-CUDA offers several unique features unavailable to Distributed: - **Automatic instantiation of per-GPU workers** -- Using Dask-CUDA's LocalCUDACluster or ``dask cuda worker`` CLI will automatically launch one worker for each GPU available on the executing node, avoiding the need to explicitly select GPUs. - **Automatic setting of CPU affinity** -- The setting of CPU affinity for each GPU is done automatically, preventing memory transfers from taking suboptimal paths. - **Automatic selection of InfiniBand devices** -- When UCX communication is enabled over InfiniBand, Dask-CUDA automatically selects the optimal InfiniBand device for each GPU (see `UCX Integration `_ for instructions on configuring UCX communication). - **Memory spilling from GPU** -- For memory-intensive workloads, Dask-CUDA supports spilling from GPU to host memory when a GPU reaches the default or user-specified memory utilization limit. - **Allocation of GPU memory** -- when using UCX communication, per-GPU memory pools can be allocated using `RAPIDS Memory Manager `_ to circumvent the costly memory buffer mappings that would be required otherwise. Contents -------- .. toctree:: :maxdepth: 1 :caption: Getting Started install quickstart troubleshooting api .. toctree:: :maxdepth: 1 :caption: Additional Features ucx explicit_comms spilling .. toctree:: :maxdepth: 1 :caption: Examples examples/best-practices examples/worker_count examples/ucx