Getting Started#
Building rapidsmpf from source is recommended when running nightly/upstream versions, since dependencies on non-ABI-stable libraries (e.g., pylibcudf) could cause temporary breakage leading to issues such as segmentation faults. Stable versions can be installed from conda or pip packages.
Build from Source#
Clone rapidsmpf and install the dependencies in a conda environment:
git clone https://github.com/rapidsai/rapidsmpf.git
cd rapidsmpf
# Choose an environment file that matches your system.
mamba env create --name rapidsmpf-dev --file conda/environments/all_cuda-131_arch-$(uname -m).yaml
# Build
./build.sh
Debug Build#
Debug builds can be produced by adding the -g flag:
./build.sh -g
AddressSanitizer-Enabled Build#
Enabling the AddressSanitizer
is also possible with the --asan flag:
./build.sh -g --asan
C++ code built with AddressSanitizer should simply work, but there are caveats for CUDA
and Python code. Any CUDA code executing with AddressSanitizer requires
protect_shadow_gap=0, which can be set via an environment variable:
ASAN_OPTIONS=protect_shadow_gap=0
On the other hand, Python may require LD_PRELOAD to be set so that the AddressSanitizer
is loaded before Python. On a conda environment, for example, there is usually a
$CONDA_PREFIX/lib/libasan.so, and thus the application may be launched as follows:
LD_PRELOAD=$CONDA_PREFIX/lib/libasan.so python ...
Python applications using CUDA will require setting both environment variables described above.
MPI#
Run the test suite using MPI:
# When using OpenMPI, we need to enable CUDA support.
export OMPI_MCA_opal_cuda_support=1
# Run the suite using two MPI processes.
mpirun -np 2 cpp/build/gtests/mpi_tests
# Alternatively
cd cpp/build && ctest -R mpi_tests_2
We can also run the shuffle benchmark. To assign each MPI rank its own GPU, we use a binder script:
# The binder script requires numactl: mamba install numactl
wget https://raw.githubusercontent.com/LStuber/binding/refs/heads/master/binder.sh
chmod a+x binder.sh
mpirun -np 2 ./binder.sh cpp/build/benchmarks/bench_shuffle
UCX#
The UCX test suite uses MPI for bootstrapping, so UCX tests must be launched with
mpirun:
# Run the suite using two processes.
mpirun -np 2 cpp/build/gtests/ucxx_tests
rrun — Distributed Launcher#
RapidsMPF includes rrun, a lightweight launcher that eliminates the MPI dependency for
multi-GPU workloads. This is particularly useful for development, testing, and
environments where MPI is not available. See the
Streaming Engine documentation for more on the
programming model.
Single-Node Usage#
# Build rrun
cd cpp/build
cmake --build . --target rrun
# Launch 2 ranks on the local node
./tools/rrun -n 2 ./benchmarks/bench_comm -C ucxx -O all-to-all
# With verbose output and specific GPUs
./tools/rrun -v -n 4 -g 0,1,2,3 ./benchmarks/bench_comm -C ucxx
See C++ for the full C++ rrun and multi-GPU launch guide.