# C++

RapidsMPF exposes a full C++ API for building high-performance distributed GPU
workloads without a Python runtime. The C++ layer is the foundation on which the
Python bindings are built.

The C++ API reference is available at
[docs.rapids.ai/api/librapidsmpf/stable](https://docs.rapids.ai/api/librapidsmpf/stable/)
([nightly](https://docs.rapids.ai/api/librapidsmpf/nightly/)).

## Coverage

The C++ API provides access to all core RapidsMPF subsystems:

- **Communicator** — MPI and UCXX backends for inter-process communication.
- **Shuffler** — Out-of-core, distributed table shuffle service.
- **Streaming Engine** — Asynchronous multi-GPU pipeline with Channels, Actors, and Messages.
- **Memory** — BufferResource, spilling, pinned memory, and packed data utilities.
- **Config** — Configuration options and environment-variable parsing.

## Table Shuffle Service

See {doc}`../background/shuffle-architecture` for an in-depth explanation of the
shuffle design.

The following is a complete MPI program that uses the RapidsMPF shuffler:

```{literalinclude} ../../../cpp/examples/example_shuffle.cpp
:language: cpp
:lines: 7-
```

## rrun — Distributed Launcher

RapidsMPF includes `rrun`, a lightweight launcher that eliminates the MPI dependency
for multi-GPU workloads. See {doc}`../background/streaming-engine` for more on the
programming model.

### Build rrun

```bash
cd cpp/build
cmake --build . --target rrun
```

### Single-Node Launch

```bash
# Launch 2 ranks on the local node
./tools/rrun -n 2 ./benchmarks/bench_comm -C ucxx -O all-to-all

# With verbose output and specific GPUs
./tools/rrun -v -n 4 -g 0,1,2,3 ./benchmarks/bench_comm -C ucxx
```