RapidsMPF exposes a full C++ API for building high-performance distributed GPU workloads. It provides communications primitives, an out-of-core distributed shuffle service, and an asynchronous multi-GPU streaming engine built on RAPIDS components.