Azure VM Cluster (via Dask)#

Create a Cluster using Dask Cloud Provider#

The easiest way to setup a multi-node, multi-GPU cluster on Azure is to use Dask Cloud Provider.

1. Install Dask Cloud Provider#

Dask Cloud Provider can be installed via conda or pip. The Azure-specific capabilities will need to be installed via the [azure] pip extra.

$ pip install dask-cloudprovider[azure]

2. Configure your Azure Resources#

Set up your Azure Resouce Group, Virtual Network, and Security Group according to Dask Cloud Provider instructions.

3. Create a Cluster#

In Python terminal, a cluster can be created using the dask_cloudprovider package. The below example creates a cluster with 2 workers in westus2 with Standard_NC12s_v3 VMs. The VMs should have at least 100GB of disk space in order to accommodate the RAPIDS container image and related dependencies.

from dask_cloudprovider.azure import AzureVMCluster

resource_group = "<RESOURCE_GROUP>"
vnet = "<VNET>"
security_group = "<SECURITY_GROUP>"
subscription_id = "<SUBSCRIPTION_ID>"
cluster = AzureVMCluster(
    resource_group=resource_group,
    vnet=vnet,
    security_group=security_group,
    subscription_id=subscription_id,
    location="westus2",
    vm_size="Standard_NC12s_v3",
    public_ingress=True,
    disk_size=100,
    n_workers=2,
    worker_class="dask_cuda.CUDAWorker",
    docker_image="nvcr.io/nvidia/rapidsai/base:24.04-cuda11.8-py3.10",
    docker_args="-p 8787:8787 -p 8786:8786",
)

4. Test RAPIDS#

To test RAPIDS, create a distributed client for the cluster and query for the GPU model.

from dask.distributed import Client

client = Client(cluster)


def get_gpu_model():
    import pynvml

    pynvml.nvmlInit()
    return pynvml.nvmlDeviceGetName(pynvml.nvmlDeviceGetHandleByIndex(0))


client.submit(get_gpu_model).result()
Out[5]: b'Tesla V100-PCIE-16GB'

5. Cleanup#

Once done with the cluster, ensure the cluster and client are closed:

client.close()
cluster.close()

Related Examples#

Multi-Node Multi-GPU XGBoost example on Azure using dask-cloudprovider

cloud/azure/azure-vm-multi tools/dask-cloudprovider library/cudf library/cuml library/xgboost library/dask library/fil data-storage/azure-data-lake dataset/nyc-taxi workflow/xgboost