Namespaces | Classes | Typedefs | Enumerations | Functions
rapidsmpf::bootstrap Namespace Reference

Namespaces

 detail
 

Classes

struct  Context
 Context information for the current process/rank. More...
 

Typedefs

using Rank = std::int32_t
 Type alias for communicator::Rank.
 
using Duration = std::chrono::duration< double >
 Type alias for Duration type.
 

Enumerations

enum class  BackendType { AUTO , FILE , SLURM }
 Backend types for process coordination and bootstrapping. More...
 

Functions

Context init (BackendType type=BackendType::AUTO)
 Initialize the bootstrap context from environment variables. More...
 
void barrier (Context const &ctx)
 Perform a barrier synchronization across all ranks. More...
 
void sync (Context const &ctx)
 Ensure all previous put() operations are globally visible. More...
 
void put (Context const &ctx, std::string const &key, std::string const &value)
 Store a key-value pair in the coordination backend (rank 0 only). More...
 
std::string get (Context const &ctx, std::string const &key, Duration timeout=std::chrono::seconds{30})
 Retrieve a value from the coordination backend. More...
 
std::optional< std::string > getenv_optional (std::string_view name)
 Get environment variable as optional string. More...
 
std::optional< int > getenv_int (std::string_view name)
 Parse integer from environment variable. More...
 
std::string get_current_cpu_affinity ()
 Get current CPU affinity as a string. More...
 
std::string get_ucx_net_devices ()
 Get UCX_NET_DEVICES from environment. More...
 
int get_gpu_id ()
 Get GPU ID from CUDA_VISIBLE_DEVICES. More...
 
bool is_running_with_rrun ()
 Check if the current process was launched via rrun. More...
 
bool is_running_with_slurm ()
 Check if the current process is running under Slurm with PMIx. More...
 
Rank get_rank ()
 Get the current bootstrap rank. More...
 
Rank get_nranks ()
 Get the number of bootstrap ranks. More...
 
std::vector< int > parse_cpu_list (std::string const &cpulist)
 Parse CPU list string into vector of core IDs. More...
 
bool compare_cpu_affinity (std::string const &actual, std::string const &expected)
 Compare two CPU affinity strings (order-independent). More...
 
bool compare_device_lists (std::string const &actual, std::string const &expected)
 Compare two comma-separated device lists (order-independent). More...
 

Detailed Description

SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0

SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0

Enumeration Type Documentation

◆ BackendType

Backend types for process coordination and bootstrapping.

Enumerator
AUTO 

Automatically detect the best backend based on environment.

Detection order:

  1. File-based (if RRUN_COORD_DIR set by rrun)
  2. Slurm/PMIx (if SLURM environment detected)
  3. File-based (default fallback)
FILE 

File-based coordination using a shared directory.

Uses filesystem for rank coordination and address exchange. Works on single-node and multi-node with shared storage (e.g., NFS) via SSH. Requires RRUN_RANK, RRUN_NRANKS, RRUN_COORD_DIR environment variables.

SLURM 

Slurm-based coordination using PMIx.

Uses PMIx (Process Management Interface for Exascale) for scalable process coordination without requiring a shared filesystem. Designed for Slurm clusters and supports multi-node deployments.

Run with: srun --mpi=pmix -n <nranks> ./program

Environment variables (automatically set by Slurm):

  • PMIX_NAMESPACE: PMIx namespace identifier
  • SLURM_PROCID: Process rank
  • SLURM_NPROCS/SLURM_NTASKS: Total number of processes

Definition at line 18 of file backend.hpp.

Function Documentation

◆ barrier()

void rapidsmpf::bootstrap::barrier ( Context const &  ctx)

Perform a barrier synchronization across all ranks.

This ensures all ranks reach this point before any rank proceeds.

Parameters
ctxBootstrap context.

◆ compare_cpu_affinity()

bool rapidsmpf::bootstrap::compare_cpu_affinity ( std::string const &  actual,
std::string const &  expected 
)

Compare two CPU affinity strings (order-independent).

Compares two CPU affinity strings by parsing them into sorted lists of core IDs and checking if they contain the same cores, regardless of order or formatting.

Parameters
actualActual CPU affinity string.
expectedExpected CPU affinity string.
Returns
true if both strings represent the same set of CPU cores, false otherwise.

◆ compare_device_lists()

bool rapidsmpf::bootstrap::compare_device_lists ( std::string const &  actual,
std::string const &  expected 
)

Compare two comma-separated device lists (order-independent).

Compares two comma-separated device lists by parsing them into sorted vectors and checking if they contain the same devices, regardless of order.

Parameters
actualActual device list string.
expectedExpected device list string.
Returns
true if both strings represent the same set of devices, false otherwise.

◆ get()

std::string rapidsmpf::bootstrap::get ( Context const &  ctx,
std::string const &  key,
Duration  timeout = std::chrono::seconds{30} 
)

Retrieve a value from the coordination backend.

Any rank (including rank 0) can call this function to retrieve values published by rank 0. This function blocks until the key is available or timeout occurs.

Parameters
ctxBootstrap context.
keyKey name to retrieve.
timeoutTimeout duration.
Returns
Value associated with the key.
Exceptions
std::runtime_errorif key not found within timeout.

◆ get_current_cpu_affinity()

std::string rapidsmpf::bootstrap::get_current_cpu_affinity ( )

Get current CPU affinity as a string.

Queries the current process's CPU affinity mask and formats it as a comma-separated list of CPU core IDs, with ranges represented as "start-end".

Example output: "0-3,8-11" for cores 0,1,2,3,8,9,10,11

Returns
CPU affinity string, or empty string on error.

◆ get_gpu_id()

int rapidsmpf::bootstrap::get_gpu_id ( )

Get GPU ID from CUDA_VISIBLE_DEVICES.

Attempts to determine the GPU ID assigned to this process by checking the CUDA_VISIBLE_DEVICES environment variable.

Returns
GPU ID (>= 0) if found, -1 otherwise.

◆ get_nranks()

Rank rapidsmpf::bootstrap::get_nranks ( )

Get the number of bootstrap ranks.

This helper retrieves the number of ranks when running with a bootstrap launcher (rrun or Slurm). Checks environment variables in order:

  1. RRUN_NRANKS (set by rrun)
  2. SLURM_NPROCS (set by Slurm)
  3. SLURM_NTASKS (set by Slurm)
Returns
Number of ranks.
Exceptions
std::runtime_errorif not running with a bootstrap launcher or if the environment variable cannot be parsed.

◆ get_rank()

Rank rapidsmpf::bootstrap::get_rank ( )

Get the current bootstrap rank.

This helper retrieves the rank of the current process when running with a bootstrap launcher (rrun or Slurm). Checks environment variables in order:

  1. RRUN_RANK (set by rrun)
  2. PMIX_RANK (set by PMIx)
  3. SLURM_PROCID (set by Slurm)
Returns
Rank of the current process.
Exceptions
std::runtime_errorif not running with a bootstrap launcher or if the environment variable cannot be parsed.

◆ get_ucx_net_devices()

std::string rapidsmpf::bootstrap::get_ucx_net_devices ( )

Get UCX_NET_DEVICES from environment.

Retrieves the value of the UCX_NET_DEVICES environment variable, which specifies which network devices UCX should use for communication.

Returns
Value of UCX_NET_DEVICES, or empty string if not set.

◆ getenv_int()

std::optional<int> rapidsmpf::bootstrap::getenv_int ( std::string_view  name)

Parse integer from environment variable.

Retrieves an environment variable and parses it as an integer.

Parameters
nameName of the environment variable to retrieve.
Returns
Parsed integer value, or std::nullopt if not set.
Exceptions
std::runtime_errorif the variable is set but cannot be parsed as an integer.

◆ getenv_optional()

std::optional<std::string> rapidsmpf::bootstrap::getenv_optional ( std::string_view  name)

Get environment variable as optional string.

Retrieves the value of an environment variable by name, returning it as std::optional<std::string>. Returns std::nullopt if the variable is not set.

Parameters
nameName of the environment variable to retrieve.
Returns
Value of the environment variable, or std::nullopt if not set.

◆ init()

Context rapidsmpf::bootstrap::init ( BackendType  type = BackendType::AUTO)

Initialize the bootstrap context from environment variables.

This function reads environment variables to determine rank, nranks, and backend configuration. It should be called early in the application lifecycle.

Environment variables checked (in order of precedence):

  • RRUN_RANK: Explicitly set rank
  • RRUN_NRANKS: Explicitly set total rank count
  • RRUN_COORD_DIR: File-based coordination directory
Parameters
typeBackend type to use (default: AUTO for auto-detection).
Returns
Context object containing rank and coordination information.
Exceptions
std::runtime_errorif environment is not properly configured.
std::cout << "I am rank " << ctx.rank << " of " << ctx.nranks << std::endl;
Context init(BackendType type=BackendType::AUTO)
Initialize the bootstrap context from environment variables.

◆ is_running_with_rrun()

bool rapidsmpf::bootstrap::is_running_with_rrun ( )

Check if the current process was launched via rrun.

This helper detects bootstrap mode by checking for the presence of the RRUN_RANK environment variable, which is set by rrun.

Returns
true if running under rrun bootstrap mode, false otherwise.

◆ is_running_with_slurm()

bool rapidsmpf::bootstrap::is_running_with_slurm ( )

Check if the current process is running under Slurm with PMIx.

This helper detects Slurm environment by checking for PMIx namespace or Slurm job step environment variables.

Returns
true if running under Slurm with PMIx, false otherwise.

◆ parse_cpu_list()

std::vector<int> rapidsmpf::bootstrap::parse_cpu_list ( std::string const &  cpulist)

Parse CPU list string into vector of core IDs.

Parses a comma-separated CPU list string that may contain ranges (e.g., "0-3,8-11") into a vector of individual CPU core IDs.

Parameters
cpulistCPU list string (e.g., "0-3,8-11" or "0,1,2,3").
Returns
Vector of CPU core IDs. Empty if parsing fails or input is empty.

◆ put()

void rapidsmpf::bootstrap::put ( Context const &  ctx,
std::string const &  key,
std::string const &  value 
)

Store a key-value pair in the coordination backend (rank 0 only).

Only rank 0 should call this function. The key-value pair is made visible to all ranks after a sync() call. Use this for custom coordination such as UCXX address exchange.

Parameters
ctxBootstrap context.
keyKey name.
valueValue to store.
Exceptions
std::runtime_errorif called by non-zero rank.

◆ sync()

void rapidsmpf::bootstrap::sync ( Context const &  ctx)

Ensure all previous put() operations are globally visible.

Different backends have different visibility semantics for put() operations:

  • Slurm/PMIx: Requires explicit fence (PMIx_Fence) to make data visible across nodes.
  • FILE: put() operations are immediately visible via atomic filesystem operations.

This function abstracts these differences. Call sync() after put() operations to ensure data is visible to other ranks before they attempt get().

Parameters
ctxBootstrap context.