Classes | |
| struct | bind_options |
| Options controlling which topology-based resource bindings to apply. More... | |
| struct | resource_binding |
| Live resource binding configuration collected from the running process. More... | |
| struct | expected_binding |
| Expected resource binding derived from topology information. More... | |
| struct | binding_validation |
| Results of validating actual vs. expected resource bindings. More... | |
| class | ScopedEnvVar |
| RAII guard that saves, optionally modifies, and restores an environment variable. More... | |
Functions | |
| resource_binding | check_binding (int gpu_id_hint=-1) |
| Collect the live resource binding of the calling process. More... | |
| std::optional< expected_binding > | get_expected_binding (cucascade::memory::system_topology_info const &topology, int gpu_id) |
| Obtain the expected binding for a GPU from pre-discovered topology. More... | |
| binding_validation | validate_binding (resource_binding const &actual, expected_binding const &expected) |
| Validate an actual resource binding against an expected one. More... | |
| void | bind (std::optional< unsigned int > gpu_id=std::nullopt, bind_options const &options={}) |
| Bind the calling process to resources topologically close to a GPU. More... | |
| void | bind (cucascade::memory::system_topology_info const &topology, std::optional< unsigned int > gpu_id=std::nullopt, bind_options const &options={}) |
| Bind using pre-discovered topology information. More... | |
SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
| void rapidsmpf::rrun::bind | ( | cucascade::memory::system_topology_info const & | topology, |
| std::optional< unsigned int > | gpu_id = std::nullopt, |
||
| bind_options const & | options = {} |
||
| ) |
Bind using pre-discovered topology information.
Same as the other overload, but skips the topology discovery step by reusing a previously obtained system_topology_info. Useful when the caller has already performed discovery (e.g., in a parent process before forking).
GPU resolution follows the same order as the other overload (explicit gpu_id, then CUDA_VISIBLE_DEVICES).
UCX_NET_DEVICES environment variable). It should be called exactly once per process, ideally early in initialization and before other threads are spawned.| topology | Pre-discovered system topology. |
| gpu_id | GPU device index to bind for. When std::nullopt, the first GPU in CUDA_VISIBLE_DEVICES is used instead. |
| options | Controls which resource bindings to apply. |
| std::runtime_error | if no GPU ID can be determined, the resolved GPU is not found in topology, an enabled binding (CPU affinity, NUMA memory policy, network devices) could not be applied, or post-bind verification detects a mismatch between the requested and actual binding state. |
| void rapidsmpf::rrun::bind | ( | std::optional< unsigned int > | gpu_id = std::nullopt, |
| bind_options const & | options = {} |
||
| ) |
Bind the calling process to resources topologically close to a GPU.
Discovers the system topology via cucascade::memory::topology_discovery, then applies CPU affinity, NUMA memory binding, and/or network device configuration as requested in options.
This is the self-contained entry point intended for external libraries that do not launch through the rrun CLI.
CUDA_VISIBLE_DEVICES environment variable during topology discovery and mutates process-wide state (CPU affinity, NUMA memory policy, and the UCX_NET_DEVICES environment variable). It should be called exactly once per process, ideally early in initialization and before other threads are spawned.GPU resolution order:
gpu_id if provided.CUDA_VISIBLE_DEVICES environment variable.std::runtime_error.| gpu_id | GPU device index (as reported by nvidia-smi) to bind for. When std::nullopt, the first GPU in CUDA_VISIBLE_DEVICES is used instead. |
| options | Controls which resource bindings to apply. |
| std::runtime_error | if no GPU ID can be determined, topology discovery fails, the resolved GPU is not found in the discovered topology, an enabled binding (CPU affinity, NUMA memory policy, network devices) could not be applied, or post-bind verification detects a mismatch between the requested and actual binding state. |
| resource_binding rapidsmpf::rrun::check_binding | ( | int | gpu_id_hint = -1 | ) |
Collect the live resource binding of the calling process.
Queries the current CPU affinity, NUMA memory nodes, UCX network device configuration, process rank, and GPU information. Fields that cannot be determined (e.g. rank when no launcher environment is set, or GPU ID when CUDA_VISIBLE_DEVICES is absent and no hint is given) are left at their default value of -1.
| gpu_id_hint | GPU device index hint. When >= 0 the value is stored directly; otherwise the GPU ID is read from CUDA_VISIBLE_DEVICES. When a valid GPU ID is available, the PCI bus ID is also queried. |
| std::optional<expected_binding> rapidsmpf::rrun::get_expected_binding | ( | cucascade::memory::system_topology_info const & | topology, |
| int | gpu_id | ||
| ) |
Obtain the expected binding for a GPU from pre-discovered topology.
Looks up gpu_id in topology and returns the expected CPU affinity, memory binding, and network devices.
| topology | Pre-discovered system topology. |
| gpu_id | GPU device index to look up. |
std::nullopt if gpu_id is not found. | binding_validation rapidsmpf::rrun::validate_binding | ( | resource_binding const & | actual, |
| expected_binding const & | expected | ||
| ) |
Validate an actual resource binding against an expected one.
Compares the live actual binding with expected and reports per-resource pass/fail status.
| actual | Live resource binding (from check_binding()). |
| expected | Expected binding (from topology or a JSON file). |