Namespaces | |
| detail | |
Classes | |
| struct | Context |
| Context information for the current process/rank. More... | |
Typedefs | |
| using | Rank = std::int32_t |
| Type alias for communicator::Rank. | |
| using | Duration = std::chrono::duration< double > |
| Type alias for Duration type. | |
Enumerations | |
| enum class | BackendType { AUTO , FILE , SLURM } |
| Backend types for process coordination and bootstrapping. More... | |
Functions | |
| Context | init (BackendType type=BackendType::AUTO) |
| Initialize the bootstrap context from environment variables. More... | |
| void | barrier (Context const &ctx) |
| Perform a barrier synchronization across all ranks. More... | |
| void | sync (Context const &ctx) |
| Ensure all previous put() operations are globally visible. More... | |
| void | put (Context const &ctx, std::string const &key, std::string const &value) |
| Store a key-value pair in the coordination backend (rank 0 only). More... | |
| std::string | get (Context const &ctx, std::string const &key, Duration timeout=std::chrono::seconds{30}) |
| Retrieve a value from the coordination backend. More... | |
| std::optional< std::string > | getenv_optional (std::string_view name) |
| Get environment variable as optional string. More... | |
| std::optional< int > | getenv_int (std::string_view name) |
| Parse integer from environment variable. More... | |
| std::string | get_current_cpu_affinity () |
| Get current CPU affinity as a string. More... | |
| std::string | get_ucx_net_devices () |
| Get UCX_NET_DEVICES from environment. More... | |
| int | get_gpu_id () |
| Get GPU ID from CUDA_VISIBLE_DEVICES. More... | |
| bool | is_running_with_rrun () |
Check if the current process was launched via rrun. More... | |
| bool | is_running_with_slurm () |
| Check if the current process is running under Slurm with PMIx. More... | |
| Rank | get_rank () |
| Get the current bootstrap rank. More... | |
| Rank | get_nranks () |
| Get the number of bootstrap ranks. More... | |
| std::vector< int > | parse_cpu_list (std::string const &cpulist) |
| Parse CPU list string into vector of core IDs. More... | |
| bool | compare_cpu_affinity (std::string const &actual, std::string const &expected) |
| Compare two CPU affinity strings (order-independent). More... | |
| bool | compare_device_lists (std::string const &actual, std::string const &expected) |
| Compare two comma-separated device lists (order-independent). More... | |
SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
|
strong |
Backend types for process coordination and bootstrapping.
Definition at line 18 of file backend.hpp.
| void rapidsmpf::bootstrap::barrier | ( | Context const & | ctx | ) |
Perform a barrier synchronization across all ranks.
This ensures all ranks reach this point before any rank proceeds.
| ctx | Bootstrap context. |
| bool rapidsmpf::bootstrap::compare_cpu_affinity | ( | std::string const & | actual, |
| std::string const & | expected | ||
| ) |
Compare two CPU affinity strings (order-independent).
Compares two CPU affinity strings by parsing them into sorted lists of core IDs and checking if they contain the same cores, regardless of order or formatting.
| actual | Actual CPU affinity string. |
| expected | Expected CPU affinity string. |
| bool rapidsmpf::bootstrap::compare_device_lists | ( | std::string const & | actual, |
| std::string const & | expected | ||
| ) |
Compare two comma-separated device lists (order-independent).
Compares two comma-separated device lists by parsing them into sorted vectors and checking if they contain the same devices, regardless of order.
| actual | Actual device list string. |
| expected | Expected device list string. |
| std::string rapidsmpf::bootstrap::get | ( | Context const & | ctx, |
| std::string const & | key, | ||
| Duration | timeout = std::chrono::seconds{30} |
||
| ) |
Retrieve a value from the coordination backend.
Any rank (including rank 0) can call this function to retrieve values published by rank 0. This function blocks until the key is available or timeout occurs.
| ctx | Bootstrap context. |
| key | Key name to retrieve. |
| timeout | Timeout duration. |
| std::runtime_error | if key not found within timeout. |
| std::string rapidsmpf::bootstrap::get_current_cpu_affinity | ( | ) |
Get current CPU affinity as a string.
Queries the current process's CPU affinity mask and formats it as a comma-separated list of CPU core IDs, with ranges represented as "start-end".
Example output: "0-3,8-11" for cores 0,1,2,3,8,9,10,11
| int rapidsmpf::bootstrap::get_gpu_id | ( | ) |
Get GPU ID from CUDA_VISIBLE_DEVICES.
Attempts to determine the GPU ID assigned to this process by checking the CUDA_VISIBLE_DEVICES environment variable.
| Rank rapidsmpf::bootstrap::get_nranks | ( | ) |
Get the number of bootstrap ranks.
This helper retrieves the number of ranks when running with a bootstrap launcher (rrun or Slurm). Checks environment variables in order:
| std::runtime_error | if not running with a bootstrap launcher or if the environment variable cannot be parsed. |
| Rank rapidsmpf::bootstrap::get_rank | ( | ) |
Get the current bootstrap rank.
This helper retrieves the rank of the current process when running with a bootstrap launcher (rrun or Slurm). Checks environment variables in order:
| std::runtime_error | if not running with a bootstrap launcher or if the environment variable cannot be parsed. |
| std::string rapidsmpf::bootstrap::get_ucx_net_devices | ( | ) |
Get UCX_NET_DEVICES from environment.
Retrieves the value of the UCX_NET_DEVICES environment variable, which specifies which network devices UCX should use for communication.
| std::optional<int> rapidsmpf::bootstrap::getenv_int | ( | std::string_view | name | ) |
Parse integer from environment variable.
Retrieves an environment variable and parses it as an integer.
| name | Name of the environment variable to retrieve. |
| std::runtime_error | if the variable is set but cannot be parsed as an integer. |
| std::optional<std::string> rapidsmpf::bootstrap::getenv_optional | ( | std::string_view | name | ) |
Get environment variable as optional string.
Retrieves the value of an environment variable by name, returning it as std::optional<std::string>. Returns std::nullopt if the variable is not set.
| name | Name of the environment variable to retrieve. |
| Context rapidsmpf::bootstrap::init | ( | BackendType | type = BackendType::AUTO | ) |
Initialize the bootstrap context from environment variables.
This function reads environment variables to determine rank, nranks, and backend configuration. It should be called early in the application lifecycle.
Environment variables checked (in order of precedence):
| type | Backend type to use (default: AUTO for auto-detection). |
| std::runtime_error | if environment is not properly configured. |
| bool rapidsmpf::bootstrap::is_running_with_rrun | ( | ) |
Check if the current process was launched via rrun.
This helper detects bootstrap mode by checking for the presence of the RRUN_RANK environment variable, which is set by rrun.
rrun bootstrap mode, false otherwise. | bool rapidsmpf::bootstrap::is_running_with_slurm | ( | ) |
Check if the current process is running under Slurm with PMIx.
This helper detects Slurm environment by checking for PMIx namespace or Slurm job step environment variables.
| std::vector<int> rapidsmpf::bootstrap::parse_cpu_list | ( | std::string const & | cpulist | ) |
Parse CPU list string into vector of core IDs.
Parses a comma-separated CPU list string that may contain ranges (e.g., "0-3,8-11") into a vector of individual CPU core IDs.
| cpulist | CPU list string (e.g., "0-3,8-11" or "0,1,2,3"). |
| void rapidsmpf::bootstrap::put | ( | Context const & | ctx, |
| std::string const & | key, | ||
| std::string const & | value | ||
| ) |
Store a key-value pair in the coordination backend (rank 0 only).
Only rank 0 should call this function. The key-value pair is made visible to all ranks after a sync() call. Use this for custom coordination such as UCXX address exchange.
| ctx | Bootstrap context. |
| key | Key name. |
| value | Value to store. |
| std::runtime_error | if called by non-zero rank. |
| void rapidsmpf::bootstrap::sync | ( | Context const & | ctx | ) |
Ensure all previous put() operations are globally visible.
Different backends have different visibility semantics for put() operations:
This function abstracts these differences. Call sync() after put() operations to ensure data is visible to other ranks before they attempt get().
| ctx | Bootstrap context. |