# Dataproc RAPIDS can be deployed on Google Cloud Dataproc using Dask. For more details, see our **[detailed instructions and helper scripts.](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids)** **0. Copy initialization actions to your own Cloud Storage bucket.** Don't create clusters that reference initialization actions located in `gs://goog-dataproc-initialization-actions-REGION` public buckets. These scripts are provided as reference implementations and are synchronized with ongoing [GitHub repository](https://github.com/GoogleCloudDataproc/initialization-actions) changes. It is strongly recommended that you copy the initialization scripts into your own Storage bucket to prevent unintended upgrades from upstream in the cluster: ```console $ REGION= $ GCS_BUCKET= $ gcloud storage buckets create gs://$GCS_BUCKET $ gsutil cp gs://goog-dataproc-initialization-actions-${REGION}/gpu/install_gpu_driver.sh gs://$GCS_BUCKET $ gsutil cp gs://goog-dataproc-initialization-actions-${REGION}/dask/dask.sh gs://$GCS_BUCKET $ gsutil cp gs://goog-dataproc-initialization-actions-${REGION}/rapids/rapids.sh gs://$GCS_BUCKET ``` **1. Create Dataproc cluster with Dask RAPIDS.** Use the gcloud command to create a new cluster. Because of an Anaconda version conflict, script deployment on older images is slow, we recommend using Dask with Dataproc 2.0+. ```{warning} At the time of writing [Dataproc only supports RAPIDS version 23.12 and earlier with CUDA<=11.8 and Ubuntu 18.04](https://github.com/GoogleCloudDataproc/initialization-actions/issues/1137). Please ensure that your setup complies with this compatibility requirement. Using newer RAPIDS versions may result in unexpected behavior or errors. ``` ```console $ CLUSTER_NAME= $ DASK_RUNTIME=yarn $ RAPIDS_VERSION=23.12 $ CUDA_VERSION=11.8 $ gcloud dataproc clusters create $CLUSTER_NAME\ --region $REGION\ --image-version 2.0-ubuntu18\ --master-machine-type n1-standard-32\ --master-accelerator type=nvidia-tesla-t4,count=2\ --worker-machine-type n1-standard-32\ --worker-accelerator type=nvidia-tesla-t4,count=2\ --initialization-actions=gs://$GCS_BUCKET/install_gpu_driver.sh,gs://$GCS_BUCKET/dask.sh,gs://$GCS_BUCKET/rapids.sh\ --initialization-action-timeout 60m\ --optional-components=JUPYTER\ --metadata gpu-driver-provider=NVIDIA,dask-runtime=$DASK_RUNTIME,rapids-runtime=DASK,rapids-version=$RAPIDS_VERSION,cuda-version=$CUDA_VERSION\ --enable-component-gateway ``` [GCS_BUCKET] = name of the bucket to use.\ [CLUSTER_NAME] = name of the cluster.\ [REGION] = name of region where cluster is to be created.\ [DASK_RUNTIME] = Dask runtime could be set to either yarn or standalone. **2. Run Dask RAPIDS Workload.** Once the cluster has been created, the Dask scheduler listens for workers on `port 8786`, and its status dashboard is on `port 8787` on the Dataproc master node. To connect to the Dask web interface, you will need to create an SSH tunnel as described in the [Dataproc web interfaces documentation.](https://cloud.google.com/dataproc/docs/concepts/accessing/cluster-web-interfaces) You can also connect using the Dask Client Python API from a Jupyter notebook, or from a Python script or interpreter session. ```{relatedexamples} ```