Enabling UCX communication
A CUDA cluster using UCX communication can be started automatically with LocalCUDACluster or manually with the dask cuda worker
CLI tool.
In either case, a dask.distributed.Client
must be made for the worker cluster using the same Dask UCX configuration; see UCX Integration – Configuration for details on all available options.
LocalCUDACluster with Automatic Configuration
Automatic configuration was introduced in Dask-CUDA 22.02 and requires UCX >= 1.11.1. This allows the user to specify only the UCX protocol and let UCX decide which transports to use.
To connect a client to a cluster with automatically-configured UCX and an RMM pool:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(
protocol="ucx",
interface="ib0",
rmm_pool_size="1GB"
)
client = Client(cluster)
Note
The interface="ib0"
is intentionally specified above to ensure RDMACM is used in systems that support InfiniBand. On systems that don’t support InfiniBand or where RDMACM isn’t required, the interface
argument may be omitted or specified to listen on a different interface.
LocalCUDACluster with Manual Configuration
When using LocalCUDACluster with UCX communication and manual configuration, all required UCX configuration is handled through arguments supplied at construction; see API – Cluster for a complete list of these arguments. To connect a client to a cluster with all supported transports and an RMM pool:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(
protocol="ucx",
interface="ib0",
enable_tcp_over_ucx=True,
enable_nvlink=True,
enable_infiniband=True,
enable_rdmacm=True,
rmm_pool_size="1GB"
)
client = Client(cluster)
dask cuda worker
with Automatic Configuration
When using dask cuda worker
with UCX communication and automatic configuration, the scheduler, workers, and client must all be started manually, but without specifying any UCX transports explicitly. This is only supported in Dask-CUDA 22.02 and newer and requires UCX >= 1.11.1.
Scheduler
For automatic UCX configuration, we must ensure a CUDA context is created on the scheduler before UCX is initialized. This can be satisfied by specifying the DASK_DISTRIBUTED__COMM__UCX__CREATE_CUDA_CONTEXT=True
environment variable when creating the scheduler.
To start a Dask scheduler using UCX with automatic configuration and one GB of RMM pool:
$ DASK_DISTRIBUTED__COMM__UCX__CREATE_CUDA_CONTEXT=True \
> DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB \
> dask scheduler --protocol ucx --interface ib0
Note
The interface="ib0"
is intentionally specified above to ensure RDMACM is used in systems that support InfiniBand. On systems that don’t support InfiniBand or where RDMACM isn’t required, the interface
argument may be omitted or specified to listen on a different interface.
We specify UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda
above for optimal performance with InfiniBand, see details here. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand.
Workers
To start workers with automatic UCX configuration and an RMM pool of 14GB per GPU:
$ UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda
> dask cuda worker ucx://<scheduler_address>:8786 \
> --rmm-pool-size="14GB" \
> --interface="ib0"
Note
Analogous to the scheduler setup, the interface="ib0"
is intentionally specified above to ensure RDMACM is used in systems that support InfiniBand. On systems that don’t support InfiniBand or where RDMACM isn’t required, the interface
argument may be omitted or specified to listen on a different interface.
We specify UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda
above for optimal performance with InfiniBand, see details here. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand.
Client
To connect a client to the cluster with automatic UCX configuration we started:
import os
os.environ["UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES"] = "cuda"
import dask
from dask.distributed import Client
with dask.config.set({"distributed.comm.ucx.create_cuda_context": True}):
client = Client("ucx://<scheduler_address>:8786")
Alternatively, the with dask.config.set
statement from the example above may be omitted and the DASK_DISTRIBUTED__COMM__UCX__CREATE_CUDA_CONTEXT=True
environment variable specified instead:
import os
os.environ["UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES"] = "cuda"
os.environ["DASK_DISTRIBUTED__COMM__UCX__CREATE_CUDA_CONTEXT"] = "True"
from dask.distributed import Client
client = Client("ucx://<scheduler_address>:8786")
Note
We specify UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda
above for optimal performance with InfiniBand, see details here. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand.
dask cuda worker
with Manual Configuration
When using dask cuda worker
with UCX communication and manual configuration, the scheduler, workers, and client must all be started manually, each using the same UCX configuration.
Scheduler
UCX configuration options will need to be specified for dask scheduler
as environment variables; see Dask Configuration – Environment Variables for more details on the mapping between environment variables and options.
To start a Dask scheduler using UCX with all supported transports and an gigabyte RMM pool:
$ DASK_DISTRIBUTED__COMM__UCX__CUDA_COPY=True \
> DASK_DISTRIBUTED__COMM__UCX__TCP=True \
> DASK_DISTRIBUTED__COMM__UCX__NVLINK=True \
> DASK_DISTRIBUTED__COMM__UCX__INFINIBAND=True \
> DASK_DISTRIBUTED__COMM__UCX__RDMACM=True \
> DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB \
> dask scheduler --protocol ucx --interface ib0
We communicate to the scheduler that we will be using UCX with the --protocol
option, and that we will be using InfiniBand with the --interface
option.
Workers
All UCX configuration options have analogous options in dask cuda worker
; see API – Worker for a complete list of these options.
To start a cluster with all supported transports and an RMM pool:
$ dask cuda worker ucx://<scheduler_address>:8786 \
> --enable-tcp-over-ucx \
> --enable-nvlink \
> --enable-infiniband \
> --enable-rdmacm \
> --rmm-pool-size="1GB"
Client
A client can be configured to use UCX by using dask_cuda.initialize
, a utility which takes the same UCX configuring arguments as LocalCUDACluster and adds them to the current Dask configuration used when creating it; see API – Client initialization for a complete list of arguments.
To connect a client to the cluster we have made:
from dask.distributed import Client
from dask_cuda.initialize import initialize
initialize(
enable_tcp_over_ucx=True,
enable_nvlink=True,
enable_infiniband=True,
enable_rdmacm=True,
)
client = Client("ucx://<scheduler_address>:8786")