GitHub Actions

Overview

The RAPIDS team uses GitHub Actions for CI/CD. The official documentation for GitHub Actions can be viewed here.

Intended audience

Operations

Developers

Introduction
GitHub Action Workflows
How Nightlies Are Triggered
Subscribing to Nightlies
Reusable Workflows
Reusable Shell Scripts
Downloading CI Artifacts
Using Conda CI Artifacts Locally
Using Wheel CI Artifacts Locally
Using Conda CI Artifacts in Other PRs
Using Wheel CI Artifacts in Other PRs
Skipping CI for Commits
Rerunning Failed GitHub Actions

Introduction

RAPIDS uses self-hosted runners provided by NVIDIA for GPU-enabled testing. More information about these self-hosted runners can be found on the official documentation site here.

Additionally, the section here about pull request testing may be useful for users who are not already familiar with the process.

Finally, the page here outlines the list of runner labels that are available for use.

GitHub Action Workflows

Every RAPIDS repository using GitHub Actions has, at a minimum, the following three GitHub Action workflow files:

pr.yaml - rmm workflow example, rmm workflow run history
build.yaml - rmm workflow example, rmm workflow run history
test.yaml - rmm workflow example, rmm workflow run history

These GitHub Actions workflow files contain a description of all the automated jobs that run as a part of the workflow.

These jobs contain things like C++/Python builds, C++/Python tests, notebook tests, etc.

The chart below provides an overview of how each workflow file is used.

Event:	Runs workflows:	Performs Builds?	Performs Tests?	Uploads to Anaconda.org/Wheel Registry?
- PRs	- `pr.yaml`	✅	✅	❌
- `branch-*` Merges - Releases	- `build.yaml`	✅	❌	✅
- Nightlies	- `build.yaml` - `test.yaml`	✅	✅	✅

Although release workflows don’t run tests, they do go through a week of nightly testing to ensure everything works as expected. See this page for more details about the release process.

How Nightlies Are Triggered

Since RAPIDS consists of a collection of libraries that depend on each other, it’s important that nightly builds and tests run in the correct order.

The rapidsai/workflows repository has a nightly pipeline job that is responsible for triggering jobs in the correct order.

An example workflow run can be seen in the screenshot below.

Subscribing to Nightlies

A recent blog post by GitHub explains how workflows can be subscribed to via Slack.

The gist of the article is that the following command can be run in any Slack channel to subscribe that channel to a particular workflow:

/github subscribe owner/repo workflows:{name: "workflow_name"}

Multiple workflow names can also be passed to the command in order to subscribe to multiple workflows (shown in example below).

For RAPIDS libraries, it is recommended to use the following commands to subscribe a particular Slack channel to branch build, nightly build, and nightly test workflow runs:

/github subscribe rapidsai/<repo> workflows:{name: "test","build"}
/github unsubscribe rapidsai/<repo> issues pulls commits releases deployments

The second step is necessary because the /github subscribe` command will also subscribe the channel to a lot of other GitHub events, which will contribute a lot of noise.

The name field in the workflows object corresponds to the name of a particular workflow (e.g. this field).

To only subscribe to nightly builds and nightly tests (and not branch builds), the actor filter can be used:

/github subscribe rapidsai/<repo> workflows:{name: "test","build", actor:"GPUtester"}

The GPUtester account is a system account used to trigger nightly workflow runs from an upstream workflow.

Reusable Workflows

RAPIDS uses a collection of reusable GitHub Actions workflows in order to single-source common build configuration settings. These reusable workflows can be found in the rapidsai/shared-workflows repository.

An example of one of the reusable workflows used by RAPIDS is the conda-cpp-build.yaml workflow, which is the source of truth for which architectures and CUDA versions build RAPIDS C++ packages.

Similarly, the conda-cpp-tests.yaml workflow specifies configurations for testing RAPIDS C++ packages.

The majority of these reusable workflows leverage the CI images from the rapidsai/ci-imgs repository.

Reusable Shell Scripts

In addition to the reusable GitHub Actions workflows, RAPIDS projects also leverage reusable shell scripts from the rapidsai/gha-tools repository.

All of these shell scripts are prefixed with the string rapids-.

As an example, rapids-print-env is used to print common environment information.

rapids-mamba-retry is another tool that wraps the mamba executable to retry commands that fail due to transient issues like network problems.

Downloading CI Artifacts

For NVIDIA employees with VPN access, artifacts from both pull-requests and branch builds can be accessed on https://downloads.rapids.ai/.

There is a link provided at the end of every C++ and Python build job where the build artifacts for that particular workflow run can be accessed.

Using Conda CI Artifacts Locally

The artifacts that result from running conda build are conda channels. RAPIDS’ CI system then compresses these conda channels into tarballs and uploads them to https://downloads.rapids.ai/.

The packages in the conda channel can be used by extracting the tarball to your local filesystem and using the resulting path in your conda commands.

For example, the following snippet will download a pull request artifact for librmm and install it into the active conda environment:

wget https://downloads.rapids.ai/ci/rmm/pull-request/1376/5124d43/rmm_conda_cpp_cuda11_x86_64.tar.gz
mkdir local_channel
tar xzf rmm_conda_cpp_cuda11_x86_64.tar.gz -C local_channel/
mamba install --channel file://local_channel --channel rapidsai-nightly --channel conda-forge --channel nvidia librmm

Note that CI artifacts can only be downloaded while connected to the NVIDIA VPN.

Using Wheel CI Artifacts Locally

RAPIDS’ CI system compresses the wheels that it builds into tarballs and uploads them to https://downloads.rapids.ai/.

The wheels can be used by extracting the tarball to your local filesystem and using the resulting path in your pip commands.

For example, the following snippet will download a pull request artifact for librmm and install it into the active conda environment:

wget https://downloads.rapids.ai/ci/rmm/pull-request/1376/5124d43/rmm_wheel_python_rmm_cu12_39_x86_64.tar.gz
mkdir wheels
tar xzf rmm_wheel_python_rmm_cu12_39_x86_64.tar.gz -C wheels/
pip install wheels/rmm_cu12-24.2.0a1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Note that CI artifacts can only be downloaded while connected to the NVIDIA VPN.

Using Conda CI Artifacts in Other PRs

For changes that cross library boundaries, it may be necessary to test a pull request to one library with changes from a pull request to another library. Consider the overall RAPIDS dependency graph when testing. For example, if you are testing artifacts from an RMM PR rmm#A in cuML, you probably also need to create a cuDF PR cudf#B that uses the artifacts from rmm#A, and then your cuML test PR will need to include the artifact channels for both rmm#A and cudf#B.

To do this, it is necessary to download CI artifacts (described in the above section) from one library during the CI workflow of another library. First, determine the pull request number(s) to be tested from the other library. Then, fetch the CI artifacts from the other library’s pull request and use them when building and testing. The example code below demonstrates building and testing with conda packages from other library PRs. Replace the pull request numbers and library names as needed. Remember that changes to use CI artifacts should be temporary and should be reverted prior to merging any required changes in that PR.

Example 1: Building libcuml (C++) using librmm and libraft PR artifacts.

Add a new file called ci/use_conda_packages_from_prs.sh.

# ci/use_conda_packages_from_prs.sh

# download CI artifacts
LIBRAFT_CHANNEL=$(rapids-get-pr-conda-artifact raft 1388 cpp)
LIBRMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1095 cpp)

# make sure they can be found locally
conda config --system --add channels "${LIBRAFT_CHANNEL}"
conda config --system --add channels "${LIBRMM_CHANNEL}"

Then copy the following into every script in the ci/ directory that is doing conda installs.

source ./ci/use_conda_packages_from_prs.sh

Example 2: Testing cudf (Python) using librmm, rmm, and libkvikio PR artifacts.

It’s important to include all of the recursive dependencies. So, for example, Python testing jobs that use the rmm Python package also need the librmm C++ package.

# ci/use_conda_packages_from_prs.sh

# download CI artifacts
LIBKVIKIO_CHANNEL=$(rapids-get-pr-conda-artifact kvikio 224 cpp)
LIBRMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1223 cpp)
RMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1223 python)

# make sure they can be found locally
conda config --system --add channels "${LIBKVIKIO_CHANNEL}"
conda config --system --add channels "${LIBRMM_CHANNEL}"
conda config --system --add channels "${RMM_CHANNEL}"

Then copy the following into every script in the ci/ directory that is doing conda installs.

source ./ci/use_conda_packages_from_prs.sh

Note: By default rapids-get-pr-conda-artifact uses the most recent commit from the specified PR. A commit hash from the dependent PR can be added as an optional 4th argument to pin testing to a specific commit.

Using Wheel CI Artifacts in Other PRs

To use wheels produced by other PRs’ CI:

download the wheels at the beginning of CI jobs
constrain pip to always use them

Consider the following examples.

Example: Building libcuml (C++) using librmm and libraft PR artifacts.

Add a new file called ci/use_wheels_from_prs.sh.

# ci/use_wheels_from_prs.sh

RAPIDS_PY_CUDA_SUFFIX=$(rapids-wheel-ctk-name-gen "${RAPIDS_CUDA_VERSION}")

# download wheels, store the directories holding them in variables
LIBRMM_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-wheel-artifact rmm 1678 cpp
)
LIBRAFT_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-wheel-artifact raft 2433 cpp
)

# write a pip constraints file saying e.g. "whenever you encounter a requirement for 'librmm-cu12', use this wheel"
cat > /tmp/constraints.txt <<EOF
librmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBRMM_WHEELHOUSE}/librmm_*.whl)
libraft-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBRAFT_WHEELHOUSE}/libraft_*.whl)
EOF

export PIP_CONSTRAINT=/tmp/constraints.txt

Then copy the following into every script in the ci/ directory that is doing pip installs or wheel builds with e.g. pip wheel.

source ./ci/use_wheels_from_prs.sh

This should generally be enough. If any of the other CI scripts are already setting the environment variable PIP_CONSTRAINT, you may need to modify them slightly to ensure they append to, instead of overwriting, the constraints set up by use_wheels_from_prs.sh.

Example 2: Testing cudf (Python) using librmm, rmm, and libkvikio PR artifacts.

It’s important to include all of the recursive dependencies. So, for example, Python testing jobs that use the rmm Python package also need the librmm C++ package.

# ci/use_wheels_from_prs.sh

RAPIDS_PY_CUDA_SUFFIX=$(rapids-wheel-ctk-name-gen "${RAPIDS_CUDA_VERSION}")

# download wheels, store the directories holding them in variables
LIBKVIKIO_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="libkvikio_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-wheel-artifact kvikio 510 cpp
)
LIBRMM_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-wheel-artifact rmm 1678 cpp
)
RMM_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="rmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-wheel-artifact rmm 1678 python
)

# write a pip constraints file saying e.g. "whenever you encounter a requirement for 'librmm-cu12', use this wheel"
cat > /tmp/constraints.txt <<EOF
libkvikio-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBKVIKIO_WHEELHOUSE}/libkvikio_*.whl)
librmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBRMM_WHEELHOUSE}/librmm_*.whl)
rmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${RMM_WHEELHOUSE}/rmm_*.whl)
EOF

export PIP_CONSTRAINT=/tmp/constraints.txt

Then copy the following into every script in the ci/ directory that is doing pip installs or wheel builds with e.g. pip wheel.

source ./ci/use_wheels_from_prs.sh

As above, if any of the other CI scripts are already setting the environment variable PIP_CONSTRAINT, you may need to modify them slightly to ensure they append to, instead of overwriting, the constraints set up by use_wheels_from_prs.sh.

Note: By default rapids-get-pr-wheel-artifact uses the most recent commit from the specified PR. A commit hash from the dependent PR can be added as an optional 4th argument to pin testing to a specific commit.

Skipping CI for Commits

See the GitHub Actions documentation page below on how to prevent GitHub Actions from running on certain commits. This is useful for preventing GitHub Actions from running on pull requests that are not fully complete. This also helps preserve the finite GPU resources provided by the RAPIDS Ops team.

With GitHub Actions, it is not possible to configure all commits for a pull request to be skipped. It must be specified at the commit level.

Link: https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs

Rerunning Failed GitHub Actions

See the GitHub Actions documentation page below on how to rerun failed workflows. In addition to rerunning an entire workflow, GitHub Actions also provides the ability to rerun only the failed jobs in a workflow.

At this time there are no alternative ways to rerun tests with GitHub Actions beyond what is described in the documentation (e.g. there is no rerun tests comment for GitHub Actions).

Link: https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs