RAPIDS on Databricks#

0. Pre-requisites#

  1. Your Databricks workspace must have Databricks Container Services enabled.

  2. Your machine must be running a recent Docker daemon (one that is tested and works with Version 18.03.0-ce) and the docker command must be available on your PATH:

  3. It is recommended to build from a Databricks base image. But you can also build your Docker base from scratch. The Docker image must meet these requirements

1. Build custom RAPIDS container#

ARG RAPIDS_IMAGE

FROM $RAPIDS_IMAGE as rapids

RUN conda list -n rapids --explicit > /rapids/rapids-spec.txt

FROM databricksruntime/gpu-conda:cuda11

COPY --from=rapids /rapids/rapids-spec.txt /tmp/spec.txt

RUN conda create --name rapids --file /tmp/spec.txt && \
    rm -f /tmp/spec.txt
$ docker build --tag <username>/rapids_databricks:latest --build-arg RAPIDS_IMAGE=nvcr.io/nvidia/rapidsai/rapidsai-core:23.06-cuda11.8-runtime-ubuntu22.04-py3.10 ./docker

Push this image to a Docker registry (DockerHub, Amazon ECR or Azure ACR).

2. Configure and create GPU-enabled cluster#

  1. Compute > Create compute > Name your cluster > Select Multi or Single Node

  2. Select a Standard Databricks runtime.

    • Note Databricks ML Runtime does not support Databricks Container Services

  3. Under Advanced Options, in the the Docker tab select “Use your own Docker container”

    • In the Docker Image URL field, enter the image that you created above

    • Select the authentication type

  4. Select a GPU enabled worker and driver type

    • Selected GPU must be Pascal generation or greater (eg: g4dn.xlarge)

  5. Create and launch your cluster

3. Test Rapids#

For more details on Integrating Databricks Jobs with MLFlow and RAPIDS, check out this blog post.