# RapidsMPF documentation

Building high-performance GPU pipelines is hard. Each stage must move data efficiently between GPUs and processes, synchronize work, and manage limited device memory.
RapidsMPF, including its nascent Streaming Engine and Out-of-Core (OOC) shuffle, provides a unified framework for asynchronous, multi-GPU pipelines using simple streaming primitives — {term}`Channel`s, {term}`Node`s, and {term}`Message`s — built on RAPIDS components: [rmm](https://docs.rapids.ai/api/rmm/nightly), [libcudf](https://docs.rapids.ai/api/libcudf/nightly/), and [ucxx](https://docs.rapids.ai/api/ucxx/nightly/).

RapidsMPF's design leverages Explicit Data Parallelism (SPMD-style coordination) combined with a local CSP-style streaming model, enabling the engine to overlap I/O, computation, and communication. This makes it possible to handle out-of-core processing efficiently and integrate seamlessly with frontend query engines such as Polars.
The result is clean, composable, and scalable GPU streaming — from single-node prototypes to large-scale, multi-GPU deployments.

Follow the RAPIDS [installation guide](https://docs.rapids.ai/install/) to install `rapidsmpf` and the necessary dependencies.

```{toctree}
---
maxdepth: 2
caption: Contents:
---
quickstart.md
background/index.md
api.md
```