RAPIDS Multi-Processor interfaces. More...
Namespaces | |
| bootstrap | |
| coll | |
| Collective communication interfaces. | |
| communicator | |
| config | |
| detail | |
| mpi | |
| Collection of helpful MPI functions. | |
| shuffler | |
| Shuffler interfaces. | |
| streaming | |
Classes | |
| class | Tag |
| A tag used for identifying messages in a communication operation. More... | |
| class | Communicator |
| Abstract base class for a communication mechanism between nodes. More... | |
| class | MPI |
MPI communicator class that implements the Communicator interface. More... | |
| class | Single |
Single process communicator class that implements the Communicator interface. More... | |
| class | CudaEvent |
| RAII wrapper for a CUDA event with convenience methods. More... | |
| struct | cuda_error |
| Exception thrown when a CUDA error is encountered. More... | |
| class | bad_alloc |
| Exception thrown when a RapidsMPF allocation fails. More... | |
| class | out_of_memory |
| Exception thrown when RapidsMPF runs out of memory. More... | |
| class | reservation_error |
| Exception thrown when a memory reservation fails in RapidsMPF. More... | |
| struct | BloomFilter |
| A bloom filter, used for approximate set membership queries. More... | |
| class | Buffer |
| Buffer representing device or host memory. More... | |
| class | BufferResource |
| Class managing buffer resources. More... | |
| class | LimitAvailableMemory |
| A functor for querying the remaining available memory within a defined limit from an RMM statistics resource. More... | |
| class | ContentDescription |
| Description of an object's content. More... | |
| class | HostBuffer |
| Block of host memory. More... | |
| class | HostMemoryResource |
| Host memory resource using standard CPU allocation. More... | |
| class | MemoryReservation |
| Represents a reservation for future memory allocation. More... | |
| struct | PackedData |
| Bag of bytes with metadata suitable for sending over the wire. More... | |
| struct | PinnedPoolProperties |
| Properties for configuring a pinned memory pool. More... | |
| class | PinnedMemoryResource |
| Memory resource that provides pinned (page-locked) host memory using a pool. More... | |
| struct | ScopedMemoryRecord |
| Memory statistics for a specific scope. More... | |
| class | SpillManager |
| Manages memory spilling to free up device memory when needed. More... | |
| class | OwningWrapper |
| Utility class to store an arbitrary type-erased object while another object is alive. More... | |
| class | ProgressThread |
| A progress thread that can execute arbitrary functions. More... | |
| class | RmmResourceAdaptor |
| A RMM memory resource adaptor tailored to RapidsMPF. More... | |
| class | Statistics |
| Tracks statistics across rapidsmpf operations. More... | |
| class | StreamOrderedTiming |
| Stream-ordered wall-clock timer that records its result into Statistics. More... | |
| struct | overloaded |
| Helper for overloaded lambdas using std::visit. More... | |
Typedefs | |
| using | Rank = std::int32_t |
| The rank of a node (e.g. the rank of a MPI process), or world size (total number of ranks). More... | |
| using | OpID = std::int32_t |
| Operation ID defined by the user. This allows users to concurrently execute multiple operations, and each operation will be identified by its OpID. More... | |
| using | StageID = std::int32_t |
| Identifier for a stage of a communication operation. More... | |
| using | Clock = std::chrono::high_resolution_clock |
| Alias for high-resolution clock from the chrono library. | |
| using | Duration = std::chrono::duration< double > |
| Alias for a duration type representing time in seconds as a double. | |
| using | TimePoint = std::chrono::time_point< Clock, Duration > |
| Alias for a time point with double precision in seconds. | |
Enumerations | |
| enum class | AllowOverbooking : bool { NO , YES } |
| Policy controlling whether a memory reservation is allowed to overbook. More... | |
| enum class | MemoryType : int { DEVICE = 0 , PINNED_HOST = 1 , HOST = 2 } |
| Enum representing the type of memory sorted in decreasing order of preference. More... | |
| enum class | TrimZeroFraction { NO , YES } |
| Control whether a zero fractional part is omitted when formatting values. More... | |
Functions | |
| std::ostream & | operator<< (std::ostream &os, Communicator const &obj) |
| Overloads the stream insertion operator for the Communicator class. More... | |
| template<typename Range1 , typename Range2 > | |
| void | cuda_stream_join (Range1 const &downstreams, Range2 const &upstreams, CudaEvent *event=nullptr) |
| Make downstream CUDA streams wait on upstream CUDA streams. More... | |
| void | cuda_stream_join (rmm::cuda_stream_view downstream, rmm::cuda_stream_view upstream, CudaEvent *event=nullptr) |
| Make a downstream CUDA stream wait on an upstream CUDA stream. More... | |
| std::pair< std::vector< cudf::table_view >, std::unique_ptr< cudf::table > > | partition_and_split (cudf::table_view const &table, std::vector< cudf::size_type > const &columns_to_hash, int num_partitions, cudf::hash_id hash_function, std::uint32_t seed, rmm::cuda_stream_view stream, BufferResource *br, AllowOverbooking allow_overbooking=AllowOverbooking::YES) |
| Partitions rows from the input table into multiple output tables. More... | |
| std::unordered_map< shuffler::PartID, PackedData > | partition_and_pack (cudf::table_view const &table, std::vector< cudf::size_type > const &columns_to_hash, int num_partitions, cudf::hash_id hash_function, std::uint32_t seed, rmm::cuda_stream_view stream, BufferResource *br, AllowOverbooking allow_overbooking=AllowOverbooking::YES) |
| Partitions rows from the input table into multiple packed (serialized) tables. More... | |
| std::unordered_map< shuffler::PartID, PackedData > | split_and_pack (cudf::table_view const &table, std::vector< cudf::size_type > const &splits, rmm::cuda_stream_view stream, BufferResource *br, AllowOverbooking allow_overbooking=AllowOverbooking::YES) |
| Splits rows from the input table into multiple packed (serialized) tables. More... | |
| std::unique_ptr< cudf::table > | unpack_and_concat (std::vector< PackedData > &&partitions, rmm::cuda_stream_view stream, BufferResource *br, AllowOverbooking allow_overbooking=AllowOverbooking::YES) |
| Unpack (deserialize) input partitions and concatenate them into a single table. More... | |
| std::vector< PackedData > | spill_partitions (std::vector< PackedData > &&partitions, BufferResource *br) |
| Spill partitions from device memory to host memory. More... | |
| std::vector< PackedData > | unspill_partitions (std::vector< PackedData > &&partitions, BufferResource *br, AllowOverbooking allow_overbooking) |
| Move spilled partitions (i.e., packed tables in host memory) back to device memory. More... | |
| std::string | str (cudf::column_view col, cudf::size_type index, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Converts the element at a specific index in a cudf::column_view to a string. More... | |
| std::string | str (cudf::column_view col, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Converts all elements in a cudf::column_view to a string. More... | |
| std::string | str (cudf::table_view tbl, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Converts all rows in a cudf::table_view to a string. More... | |
| std::size_t | estimated_memory_usage (cudf::column_view const &col, rmm::cuda_stream_view stream) |
| Estimate the memory usage of a column. More... | |
| std::size_t | estimated_memory_usage (cudf::table_view const &tbl, rmm::cuda_stream_view stream) |
| Estimate the memory usage of a table. More... | |
| void | buffer_copy (std::shared_ptr< Statistics > statistics, Buffer &dst, Buffer const &src, std::size_t size, std::ptrdiff_t dst_offset=0, std::ptrdiff_t src_offset=0) |
| Asynchronously copy data between buffers. More... | |
| std::unordered_map< MemoryType, BufferResource::MemoryAvailable > | memory_available_from_options (RmmResourceAdaptor *mr, config::Options options) |
| Construct a map of memory-available functions from configuration options. More... | |
| std::optional< Duration > | periodic_spill_check_from_options (config::Options options) |
Get the periodic_spill_check parameter from configuration options. More... | |
| std::shared_ptr< rmm::cuda_stream_pool > | stream_pool_from_options (config::Options options) |
| Get a new CUDA stream pool from configuration options. More... | |
| constexpr std::span< MemoryType const > | leq_memory_types (MemoryType mem_type) noexcept |
Get the memory types with preference lower than or equal to mem_type. More... | |
| constexpr char const * | to_string (MemoryType mem_type) |
| Get the name of a MemoryType. More... | |
| std::ostream & | operator<< (std::ostream &os, MemoryType mem_type) |
| Overload to write type name to the output stream. More... | |
| std::istream & | operator>> (std::istream &is, MemoryType &out) |
| Overload to read a MemoryType value from an input stream. More... | |
| bool | is_pinned_memory_resources_supported () |
| Checks if the PinnedMemoryResource is supported for the current CUDA version. More... | |
| std::uint64_t | get_total_host_memory () noexcept |
| Get the total amount of system memory. More... | |
| int | get_current_numa_node () noexcept |
| Get the NUMA node ID associated with the calling CPU thread. More... | |
| std::vector< int > | get_current_numa_nodes () noexcept |
| Get current NUMA node(s) for memory binding. More... | |
| std::uint64_t | get_numa_node_host_memory (int numa_id=get_current_numa_node()) noexcept |
| Get the total amount of host memory for a NUMA node. More... | |
| template<typename MapType > | |
| std::pair< typename MapType::key_type, typename MapType::mapped_type > | extract_item (MapType &map, typename MapType::const_iterator position) |
| Extracts a key-value pair from a map, removing it from the map. More... | |
| template<typename MapType > | |
| std::pair< typename MapType::key_type, typename MapType::mapped_type > | extract_item (MapType &map, typename MapType::key_type const &key) |
| Extracts a key-value pair from a map, removing it from the map. More... | |
| template<typename MapType > | |
| MapType::mapped_type | extract_value (MapType &map, typename MapType::key_type const &key) |
| Extracts the value associated with a specific key from a map, removing the key-value pair. More... | |
| template<typename MapType > | |
| MapType::mapped_type | extract_value (MapType &map, typename MapType::const_iterator position) |
| Extracts the value associated with a specific key from a map, removing the key-value pair. More... | |
| template<typename MapType > | |
| MapType::key_type | extract_key (MapType &map, typename MapType::key_type const &key) |
| Extracts a key from a map, removing the key-value pair. More... | |
| template<typename MapType > | |
| MapType::key_type | extract_key (MapType &map, typename MapType::const_iterator position) |
| Extracts a key from a map, removing the key-value pair. More... | |
| template<typename MapType > | |
| auto | to_vector (MapType &&map) |
| Converts a map-like associative container to a vector by moving the values and discarding the keys. More... | |
| bool | is_running_under_valgrind () |
| Checks whether the application is running under Valgrind. More... | |
| template<typename T > | |
| constexpr T | safe_div (T x, T y) |
| Performs safe division, returning 0 if the denominator is zero. More... | |
| template<std::ranges::input_range R, typename T , typename Proj = std::identity> | |
| constexpr bool | contains (R &&range, T const &value, Proj proj={}) |
Backport of std::ranges::contains from C++23 for C++20. More... | |
| template<typename To , typename From > | |
| requires std::is_arithmetic_v< To > &&constexpr std::is_arithmetic_v< From > To | safe_cast (From value, std::source_location const &loc=std::source_location::current()) |
| Safely casts a numeric value to another type with overflow checking. More... | |
| std::string | trim (std::string_view text) |
| Trims whitespace from both ends of the specified string. More... | |
| std::string | to_lower (std::string_view text) |
| Converts the specified string to lowercase. More... | |
| std::string | to_upper (std::string_view text) |
| Converts the specified string to uppercase. More... | |
| std::string | format_nbytes (double nbytes, int num_decimals=2, TrimZeroFraction trim_zero_fraction=TrimZeroFraction::YES) |
| Format a byte count as a human-readable string using IEC units. More... | |
| std::string | format_duration (double seconds, int precision=2, TrimZeroFraction trim_zero_fraction=TrimZeroFraction::YES) |
| Format a time duration as a human-readable string. More... | |
| std::int64_t | parse_nbytes (std::string_view text) |
| Parse a human-readable byte count into an integer number of bytes. More... | |
| std::size_t | parse_nbytes_unsigned (std::string_view text) |
| Parse a human-readable byte count into a non-negative number of bytes. More... | |
| std::size_t | parse_nbytes_or_percent (std::string_view text, double total_bytes) |
| Parse a byte quantity or percentage into an absolute byte count. More... | |
| Duration | parse_duration (std::string_view text) |
| Parse a human-readable time duration into seconds. More... | |
| template<typename T > | |
| T | parse_string (std::string const &text) |
Specialization of parse_string for boolean values. More... | |
| template<> | |
| bool | parse_string (std::string const &text) |
Specialization of parse_string for boolean values. More... | |
| std::optional< std::string > | parse_optional (std::string text) |
| Parse an optional string value. More... | |
| std::vector< std::string > | parse_string_list (std::string_view text, char delimiter=',') |
| Parse a delimited string into a list of trimmed substrings. More... | |
Variables | |
| constexpr bool | COMM_HAVE_UCXX = false |
| Whether RapidsMPF was built with the UCXX Communicator. | |
| constexpr bool | COMM_HAVE_MPI = false |
| Whether RapidsMPF was built with the MPI Communicator. | |
| constexpr std::array< MemoryType, 3 > | MEMORY_TYPES |
| All memory types sorted in decreasing order of preference. More... | |
| constexpr std::array< char const *, MEMORY_TYPES.size()> | MEMORY_TYPE_NAMES |
Memory type names sorted to match MemoryType and MEMORY_TYPES. More... | |
| constexpr std::array< MemoryType, 2 > | SPILL_TARGET_MEMORY_TYPES |
| Memory types that are valid spill destinations in decreasing order of preference. More... | |
RAPIDS Multi-Processor interfaces.
SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
Operation ID defined by the user. This allows users to concurrently execute multiple operations, and each operation will be identified by its OpID.
int32, the number of distinct operations is limited to 2^20. Definition at line 44 of file communicator.hpp.
The rank of a node (e.g. the rank of a MPI process), or world size (total number of ranks).
Definition at line 34 of file communicator.hpp.
Identifier for a stage of a communication operation.
int32, the number of distinct stages is limited to 2^3. Definition at line 53 of file communicator.hpp.
|
strong |
Policy controlling whether a memory reservation is allowed to overbook.
This enum is used throughout RapidsMPF to specify the overbooking behavior of a memory reservation request. The exact semantics depend on the specific API and execution context in which it is used.
| Enumerator | |
|---|---|
| NO | Overbooking is not allowed. |
| YES | Overbooking is allowed. |
Definition at line 39 of file buffer_resource.hpp.
|
strong |
Enum representing the type of memory sorted in decreasing order of preference.
| Enumerator | |
|---|---|
| DEVICE | Device memory. |
| PINNED_HOST | Pinned host memory. |
| HOST | Host memory. |
Definition at line 16 of file memory_type.hpp.
|
strong |
Control whether a zero fractional part is omitted when formatting values.
| Enumerator | |
|---|---|
| NO | Always keep the fractional part. |
| YES | Omit the fractional part when it consists only of zeros. |
Definition at line 43 of file string.hpp.
| void rapidsmpf::buffer_copy | ( | std::shared_ptr< Statistics > | statistics, |
| Buffer & | dst, | ||
| Buffer const & | src, | ||
| std::size_t | size, | ||
| std::ptrdiff_t | dst_offset = 0, |
||
| std::ptrdiff_t | src_offset = 0 |
||
| ) |
Asynchronously copy data between buffers.
Copies size bytes from src, starting at src_offset, into dst at dst_offset.
| statistics | Statistics object used to record the copy operation. Use Statistics::disabled() to skip recording. |
| dst | Destination buffer. |
| src | Source buffer. |
| size | Number of bytes to copy. |
| dst_offset | Byte offset into the destination buffer. |
| src_offset | Byte offset into the source buffer. |
| std::invalid_argument | If the requested range is out of bounds. |
|
constexpr |
Backport of std::ranges::contains from C++23 for C++20.
Checks whether a range contains a given value.
| R | An input range type. |
| T | The type of the value to search for. |
| Proj | A projection function applied to each element before comparison. |
| range | The range to search. |
| value | The value to search for in the range. |
| proj | The projection to apply to each element before comparison. |
| void rapidsmpf::cuda_stream_join | ( | Range1 const & | downstreams, |
| Range2 const & | upstreams, | ||
| CudaEvent * | event = nullptr |
||
| ) |
Make downstream CUDA streams wait on upstream CUDA streams.
This call is asynchronous with respect to the host thread; no host-side blocking occurs.
| Range1 | Iterable whose elements are rmm::cuda_stream_view. |
| Range2 | Iterable whose elements are rmm::cuda_stream_view. |
| downstreams | Streams that must not run ahead. |
| upstreams | Streams whose already-enqueued work must complete first. |
| event | Optional CUDA event used for synchronization. A unique event per call is not required; the same event may be reused. If nullptr, a temporary event is created internally. The reason to provide an event is to avoid the small overhead of constructing a temporary one. |
Definition at line 34 of file cuda_stream.hpp.
|
inline |
Make a downstream CUDA stream wait on an upstream CUDA stream.
This call is asynchronous with respect to the host thread; no host-side blocking occurs.
Equivalent to calling the range overload with one upstream and one downstream.
| downstream | Stream that must not run ahead. |
| upstream | Stream whose already-enqueued work must complete first. |
| event | Optional CUDA event used for synchronization. A unique event per call is not required; the same event may be reused. If nullptr, a temporary event is created internally to avoid the small overhead of constructing one per call site. |
downstream and upstream are identical, this function is a no-op.Definition at line 91 of file cuda_stream.hpp.
| std::size_t rapidsmpf::estimated_memory_usage | ( | cudf::column_view const & | col, |
| rmm::cuda_stream_view | stream | ||
| ) |
Estimate the memory usage of a column.
| col | The column to estimate the memory usage of. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| std::size_t rapidsmpf::estimated_memory_usage | ( | cudf::table_view const & | tbl, |
| rmm::cuda_stream_view | stream | ||
| ) |
Estimate the memory usage of a table.
| tbl | The table to estimate the memory usage of. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| std::pair<typename MapType::key_type, typename MapType::mapped_type> rapidsmpf::extract_item | ( | MapType & | map, |
| typename MapType::const_iterator | position | ||
| ) |
Extracts a key-value pair from a map, removing it from the map.
| MapType | The type of the associative container. |
| map | The map from which to extract the key-value pair. |
| position | Const iterator pointing to a node in the map. |
position).| std::out_of_range | If the iterator is not found in the map. |
| std::pair<typename MapType::key_type, typename MapType::mapped_type> rapidsmpf::extract_item | ( | MapType & | map, |
| typename MapType::key_type const & | key | ||
| ) |
Extracts a key-value pair from a map, removing it from the map.
| MapType | The type of the associative container. |
| map | The map from which to extract the key-value pair. |
| key | The key to extract. |
| std::out_of_range | If the key is not found in the map. |
| MapType::key_type rapidsmpf::extract_key | ( | MapType & | map, |
| typename MapType::const_iterator | position | ||
| ) |
Extracts a key from a map, removing the key-value pair.
| MapType | The type of the associative container. |
| map | The map from which to extract the key. |
| position | Const iterator pointing to a node in the map. |
position).| std::out_of_range | If the key is not found in the map. |
| MapType::key_type rapidsmpf::extract_key | ( | MapType & | map, |
| typename MapType::key_type const & | key | ||
| ) |
Extracts a key from a map, removing the key-value pair.
| MapType | The type of the associative container. |
| map | The map from which to extract the key. |
| key | The key to extract. |
| std::out_of_range | If the key is not found in the map. |
| MapType::mapped_type rapidsmpf::extract_value | ( | MapType & | map, |
| typename MapType::const_iterator | position | ||
| ) |
Extracts the value associated with a specific key from a map, removing the key-value pair.
| MapType | The type of the associative container. |
| map | The map from which to extract the value. |
| position | Const iterator pointing to a node in the map. |
position).| std::out_of_range | If the key is not found in the map. |
| MapType::mapped_type rapidsmpf::extract_value | ( | MapType & | map, |
| typename MapType::key_type const & | key | ||
| ) |
Extracts the value associated with a specific key from a map, removing the key-value pair.
| MapType | The type of the associative container. |
| map | The map from which to extract the value. |
| key | The key associated with the value to extract. |
| std::out_of_range | If the key is not found in the map. |
| std::string rapidsmpf::format_duration | ( | double | seconds, |
| int | precision = 2, |
||
| TrimZeroFraction | trim_zero_fraction = TrimZeroFraction::YES |
||
| ) |
Format a time duration as a human-readable string.
Converts a duration given in seconds into a scaled string representation using common time units such as ns, us, ms, s, min, h, and d.
The duration is accepted as a double to support both fractional seconds and very large values without overflow.
Negative values are supported and are formatted with a leading minus sign, which is useful when representing signed time deltas.
Decimal formatting is controlled by precision. When trim_zero_fraction is set to TrimZeroFraction::YES, the fractional part is omitted entirely if all decimal digits are zero. Otherwise, the specified number of decimal places is preserved.
| seconds | Time duration to format, in seconds. |
| precision | Number of decimal places to include in the formatted value. |
| trim_zero_fraction | Whether to omit the fractional part when it consists only of zeros. |
| std::string rapidsmpf::format_nbytes | ( | double | nbytes, |
| int | num_decimals = 2, |
||
| TrimZeroFraction | trim_zero_fraction = TrimZeroFraction::YES |
||
| ) |
Format a byte count as a human-readable string using IEC units.
Converts an integer byte count into a scaled string representation using binary (base-1024) units such as KiB, MiB, and GiB.
Negative values are supported and are formatted with a leading minus sign, which is useful when representing signed byte deltas or accounting values.
Decimal formatting is controlled by precision. When trim_zero_fraction is set to TrimZeroFraction::YES, the fractional part is omitted entirely if all decimal digits are zero. Otherwise, the specified number of decimal places is preserved.
Examples:
| nbytes | Signed number of bytes to format, provided as a double to support any integer magnitude. |
| num_decimals | Number of decimal places to include in the formatted value. |
| trim_zero_fraction | Whether to omit the fractional part when it consists only of zeros. |
|
noexcept |
Get the NUMA node ID associated with the calling CPU thread.
A NUMA (Non-Uniform Memory Access) node represents a group of CPU cores and memory that have faster access to each other than to memory attached to other nodes. On NUMA systems, binding allocations and threads to the same NUMA node can significantly reduce memory access latency and improve bandwidth.
This function returns the NUMA node on which the calling thread is currently executing, as determined by the operating system's CPU and memory topology. The value can change if the thread migrates between CPUs.
If NUMA support is not available on the system or cannot be queried, the function returns 0, which corresponds to the single implicit NUMA node on non-NUMA systems.
|
noexcept |
Get current NUMA node(s) for memory binding.
Queries the NUMA node associated with the CPU on which the calling thread is currently executing. This is a best-effort approach and may not be accurate in all cases.
Since processes are typically scheduled on CPUs that are local to their memory, using the CPU's NUMA node (via numa_node_of_cpu) provides a reasonable approximation that works well in practice for topology-aware binding scenarios. This intentionally avoids querying the process memory binding policy programmatically.
If NUMA support is not available or the NUMA node cannot be determined, the function returns a vector containing a single element, 0, which corresponds to the single implicit NUMA node on non-NUMA systems.
|
noexcept |
Get the total amount of host memory for a NUMA node.
| numa_id | NUMA node for which to query the total host memory. Defaults to the current NUMA node as returned by get_current_numa_node(). |
|
noexcept |
Get the total amount of system memory.
sysconf(_SC_PAGE_SIZE) or sysconf(_SC_PHYS_PAGES) fails.
|
inline |
Checks if the PinnedMemoryResource is supported for the current CUDA version.
RapidsMPF requires CUDA 12.6 or newer to support pinned memory resources.
Definition at line 43 of file pinned_memory_resource.hpp.
| bool rapidsmpf::is_running_under_valgrind | ( | ) |
Checks whether the application is running under Valgrind.
true if the application is running under Valgrind, false otherwise.
|
constexprnoexcept |
Get the memory types with preference lower than or equal to mem_type.
The returned span reflects the predefined ordering used in MEMORY_TYPES, which lists memory types in decreasing order of preference.
| mem_type | The memory type used as the starting point. |
Definition at line 54 of file memory_type.hpp.
| std::unordered_map<MemoryType, BufferResource::MemoryAvailable> rapidsmpf::memory_available_from_options | ( | RmmResourceAdaptor * | mr, |
| config::Options | options | ||
| ) |
Construct a map of memory-available functions from configuration options.
| mr | Pointer to a memory resource adaptor. |
| options | Configuration options. |
|
inline |
Overloads the stream insertion operator for the Communicator class.
This function allows a description of a Communicator to be written to an output stream.
| os | The output stream to write to. |
| obj | The object to write. |
Definition at line 653 of file communicator.hpp.
| std::ostream& rapidsmpf::operator<< | ( | std::ostream & | os, |
| MemoryType | mem_type | ||
| ) |
Overload to write type name to the output stream.
| os | The output stream. |
| mem_type | The memory type to write name of to the output stream. |
| std::istream& rapidsmpf::operator>> | ( | std::istream & | is, |
| MemoryType & | out | ||
| ) |
Overload to read a MemoryType value from an input stream.
Parsing is case-insensitive. Supported values are: "DEVICE", "PINNED_HOST", "PINNED", "PINNED-HOST", and "HOST".
If token extraction from the stream fails, the stream state is preserved. If extraction succeeds but the token does not represent a valid MemoryType, the stream failbit is set.
| is | The input stream. |
| out | The memory type read from the input stream. |
| Duration rapidsmpf::parse_duration | ( | std::string_view | text | ) |
Parse a human-readable time duration into seconds.
Parses a numeric value followed by an optional time unit suffix and converts it to a Duration, which represents a time interval in seconds as a double.
Supported units:
Units are case-insensitive. If no unit is provided, the value is interpreted as seconds.
The numeric portion may be specified using integer, decimal, or scientific notation (e.g. "1e3", "2.5E-2"). Negative values are supported.
| text | Time duration string to parse. |
| std::invalid_argument | If the string format is invalid or the unit is not recognized. |
| std::out_of_range | If the parsed value is not finite. |
| std::int64_t rapidsmpf::parse_nbytes | ( | std::string_view | text | ) |
Parse a human-readable byte count into an integer number of bytes.
Parses a numeric value followed by an optional unit suffix and converts it to a byte count. Both IEC (base-1024) and SI (base-1000) units are supported.
Supported units:
Units are case-insensitive. If no unit is provided, the value is interpreted as bytes.
The numeric portion may be specified using integer, decimal, or scientific notation (e.g. "1e6", "2.5E-3"). The final byte count is rounded to the nearest integer, with ties rounded away from zero.
| text | Byte count string to parse. |
| std::invalid_argument | If the string format is invalid or the unit is not recognized. |
| std::out_of_range | If the parsed value is not finite or the resulting byte count overflows a 64-bit signed integer. |
| std::size_t rapidsmpf::parse_nbytes_or_percent | ( | std::string_view | text, |
| double | total_bytes | ||
| ) |
Parse a byte quantity or percentage into an absolute byte count.
The input may be a human-readable byte string (e.g. "1GiB", "512MB") or a percentage (e.g. "25%"). See parse_nbytes_unsigned for the exact parsing semantics of the numeric part.
If text ends with '', the numeric part is first parsed using parse_nbytes_unsigned, then interpreted as a percentage of total_bytes.
Otherwise, text is parsed as an absolute byte value and returned as-is.
| text | Input string representing a byte quantity or percentage. |
| total_bytes | Total number of bytes used when text is a percentage. Must be positive. |
text.| std::invalid_argument | If the input format is invalid, the value is negative, or if total_bytes is not positive. |
| std::out_of_range | If the parsed or computed value exceeds the representable range. |
| std::size_t rapidsmpf::parse_nbytes_unsigned | ( | std::string_view | text | ) |
Parse a human-readable byte count into a non-negative number of bytes.
Parses a numeric value followed by an optional unit suffix and converts it to a byte count. Both IEC (base-1024) and SI (base-1000) units are supported.
Supported units:
Units are case-insensitive. If no unit is provided, the value is interpreted as bytes.
The numeric portion may be specified using integer, decimal, or scientific notation (e.g. "1e6", "2.5E-3"). The final byte count is rounded to the nearest integer, with ties rounded away from zero.
Negative values are not permitted.
| text | Byte count string to parse. |
| std::invalid_argument | If the string format is invalid, the unit is not recognized, or the parsed value is negative. |
| std::out_of_range | If the parsed value is not finite or overflows std::size_t. |
| std::optional<std::string> rapidsmpf::parse_optional | ( | std::string | text | ) |
Parse an optional string value.
Returns std::nullopt if the input string represents a disabled value. Otherwise, the input string is returned unchanged.
Disabled values are matched case-insensitively and may include surrounding whitespace. Recognized values include: false, no, off, disable, disabled, none, n/a, and na.
| text | Input string to parse. |
| T rapidsmpf::parse_string | ( | std::string const & | text | ) |
Specialization of parse_string for boolean values.
Converts the input string to a boolean. This function handles common boolean representations such as true, false, on, off, yes, and no, as well as numeric representations (e.g., 0 or 1). The input is first checked for a numeric value using std::stoi; if that fails, it is lowercased and trimmed before matching against known textual representations.
| text | String to convert to a boolean. |
| std::invalid_argument | If the string cannot be interpreted as a boolean. |
Definition at line 242 of file string.hpp.
| bool rapidsmpf::parse_string | ( | std::string const & | text | ) |
Specialization of parse_string for boolean values.
Converts the input string to a boolean. This function handles common boolean representations such as true, false, on, off, yes, and no, as well as numeric representations (e.g., 0 or 1). The input is first checked for a numeric value using std::stoi; if that fails, it is lowercased and trimmed before matching against known textual representations.
| text | String to convert to a boolean. |
| std::invalid_argument | If the string cannot be interpreted as a boolean. |
Definition at line 242 of file string.hpp.
| std::vector<std::string> rapidsmpf::parse_string_list | ( | std::string_view | text, |
| char | delimiter = ',' |
||
| ) |
Parse a delimited string into a list of trimmed substrings.
Splits the input string by the specified delimiter and returns a vector of trimmed tokens. Leading and trailing whitespace is removed from each token.
If the input string is empty or contains only whitespace, an empty vector is returned.
| text | Input string to parse. |
| delimiter | Character to use as the delimiter. Defaults to comma. |
| std::unordered_map<shuffler::PartID, PackedData> rapidsmpf::partition_and_pack | ( | cudf::table_view const & | table, |
| std::vector< cudf::size_type > const & | columns_to_hash, | ||
| int | num_partitions, | ||
| cudf::hash_id | hash_function, | ||
| std::uint32_t | seed, | ||
| rmm::cuda_stream_view | stream, | ||
| BufferResource * | br, | ||
| AllowOverbooking | allow_overbooking = AllowOverbooking::YES |
||
| ) |
Partitions rows from the input table into multiple packed (serialized) tables.
| table | The table to partition. |
| columns_to_hash | Indices of input columns to hash. |
| num_partitions | The number of partitions to use. |
| hash_function | Hash function to use. |
| seed | Seed value to the hash function. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| br | Buffer resource for memory allocations. |
| allow_overbooking | If true, allow overbooking (true by default) // TODO: disable this by default https://github.com/rapidsmpf/rapidsmpf/issues/449 |
| std::out_of_range | if index is columns_to_hash is invalid |
| std::pair<std::vector<cudf::table_view>, std::unique_ptr<cudf::table> > rapidsmpf::partition_and_split | ( | cudf::table_view const & | table, |
| std::vector< cudf::size_type > const & | columns_to_hash, | ||
| int | num_partitions, | ||
| cudf::hash_id | hash_function, | ||
| std::uint32_t | seed, | ||
| rmm::cuda_stream_view | stream, | ||
| BufferResource * | br, | ||
| AllowOverbooking | allow_overbooking = AllowOverbooking::YES |
||
| ) |
Partitions rows from the input table into multiple output tables.
| table | The table to partition. |
| columns_to_hash | Indices of input columns to hash. |
| num_partitions | The number of partitions. |
| hash_function | Hash function to use. |
| seed | Seed value to the hash function. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| br | Buffer resource for memory allocations. |
| allow_overbooking | If true, allow overbooking (true by default) |
| std::out_of_range | if index is columns_to_hash is invalid |
| std::optional<Duration> rapidsmpf::periodic_spill_check_from_options | ( | config::Options | options | ) |
Get the periodic_spill_check parameter from configuration options.
| options | Configuration options. |
|
constexpr |
Safely casts a numeric value to another type with overflow checking.
For integral conversions, the value must be representable in the destination type or an exception is thrown.
For conversions involving floating point types, overflow and underflow follow standard floating point semantics. The result may become inf or -inf, or lose precision, without throwing.
| To | The destination type. |
| From | The source type. |
| value | The value to cast. |
| loc | Source location (automatically captured). |
| std::overflow_error | if an integral value cannot be represented in the destination type. |
|
constexpr |
| std::vector<PackedData> rapidsmpf::spill_partitions | ( | std::vector< PackedData > && | partitions, |
| BufferResource * | br | ||
| ) |
Spill partitions from device memory to host memory.
Moves the buffer of each PackedData from device memory to host memory using the provided buffer resource and the buffer's CUDA stream. Partitions that are already in host memory are passed through unchanged.
For device-resident partitions, a host memory reservation is made before moving the buffer. If the reservation fails due to insufficient host memory, an exception is thrown. Overbooking is not allowed.
| partitions | The partitions to spill. |
| br | Buffer resource used to reserve host memory and perform the move. |
PackedData, where each buffer resides in host memory.| rapidsmpf::reservation_error | If host memory reservation fails. |
| std::unordered_map<shuffler::PartID, PackedData> rapidsmpf::split_and_pack | ( | cudf::table_view const & | table, |
| std::vector< cudf::size_type > const & | splits, | ||
| rmm::cuda_stream_view | stream, | ||
| BufferResource * | br, | ||
| AllowOverbooking | allow_overbooking = AllowOverbooking::YES |
||
| ) |
Splits rows from the input table into multiple packed (serialized) tables.
| table | The table to split and pack into partitions. |
| splits | The split points, equivalent to cudf::split(), i.e. one less than the number of result partitions. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| br | Buffer resource for memory allocations. |
| allow_overbooking | If true, allow overbooking (true by default) // TODO: disable this by default https://github.com/rapidsmpf/rapidsmpf/issues/449 |
| std::out_of_range | if the splits are invalid. |
| std::string rapidsmpf::str | ( | cudf::column_view | col, |
| cudf::size_type | index, | ||
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) |
Converts the element at a specific index in a cudf::column_view to a string.
| col | The column view containing the data. |
| index | The index of the element to convert. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| mr | Memory resource for device memory allocation. |
| std::string rapidsmpf::str | ( | cudf::column_view | col, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) |
Converts all elements in a cudf::column_view to a string.
| col | The column view containing the data. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| mr | Memory resource for device memory allocation. |
| std::string rapidsmpf::str | ( | cudf::table_view | tbl, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) |
Converts all rows in a cudf::table_view to a string.
| tbl | The table view containing the data. |
| stream | CUDA stream used for device memory operations and kernel launches. |
| mr | Memory resource for device memory allocation. |
| std::shared_ptr<rmm::cuda_stream_pool> rapidsmpf::stream_pool_from_options | ( | config::Options | options | ) |
Get a new CUDA stream pool from configuration options.
| options | Configuration options. |
| std::string rapidsmpf::to_lower | ( | std::string_view | text | ) |
Converts the specified string to lowercase.
| text | The input string to be processed. |
|
constexpr |
Get the name of a MemoryType.
| mem_type | The memory type. |
Definition at line 75 of file memory_type.hpp.
| std::string rapidsmpf::to_upper | ( | std::string_view | text | ) |
Converts the specified string to uppercase.
| text | The input string to be processed. |
| auto rapidsmpf::to_vector | ( | MapType && | map | ) |
Converts a map-like associative container to a vector by moving the values and discarding the keys.
| MapType | The type of the map-like associative container. Must provide a mapped_type and support range-based for-loops. |
| map | The map whose values will be moved into the resulting vector. Keys are ignored. |
| std::string rapidsmpf::trim | ( | std::string_view | text | ) |
Trims whitespace from both ends of the specified string.
| text | The input string to be processed. |
| std::unique_ptr<cudf::table> rapidsmpf::unpack_and_concat | ( | std::vector< PackedData > && | partitions, |
| rmm::cuda_stream_view | stream, | ||
| BufferResource * | br, | ||
| AllowOverbooking | allow_overbooking = AllowOverbooking::YES |
||
| ) |
Unpack (deserialize) input partitions and concatenate them into a single table.
Empty partitions are ignored.
The unpacking of each partition is stream-ordered on that partition's own CUDA stream. The returned table is stream-ordered on the provided stream and synchronized with the unpacking.
| partitions | Packed input tables (partitions). |
| stream | CUDA stream on which concatenation occurs and on which the resulting table is ordered. |
| br | Buffer resource used for memory allocations. |
| allow_overbooking | If true, allow overbooking (true by default). |
| rapidsmpf::reservation_error | If the buffer resource cannot reserve enough memory to concatenate all partitions. |
| std::logic_error | If the partitions are not in device memory. |
| std::vector<PackedData> rapidsmpf::unspill_partitions | ( | std::vector< PackedData > && | partitions, |
| BufferResource * | br, | ||
| AllowOverbooking | allow_overbooking | ||
| ) |
Move spilled partitions (i.e., packed tables in host memory) back to device memory.
Each partition is inspected to determine whether its buffer resides in device memory. Buffers already in device memory are left untouched. Host-resident buffers are moved to device memory using the provided buffer resource and the buffer's CUDA stream.
If insufficient device memory is available, the buffer resource's spill manager is invoked to free memory. If overbooking occurs and spilling fails to reclaim enough memory, behavior depends on the allow_overbooking flag.
| partitions | The partitions to unspill, potentially containing host-resident data. |
| br | Buffer resource responsible for memory reservation and spills. |
| allow_overbooking | If false, ensures enough memory is freed to satisfy the reservation; otherwise, allows overbooking even if spilling was insufficient. |
PackedData, each with a buffer in device memory.| rapidsmpf::reservation_error | If overbooking exceeds the amount spilled and allow_overbooking is false. |
|
constexpr |
Memory type names sorted to match MemoryType and MEMORY_TYPES.
Definition at line 28 of file memory_type.hpp.
|
constexpr |
All memory types sorted in decreasing order of preference.
Definition at line 23 of file memory_type.hpp.
|
constexpr |
Memory types that are valid spill destinations in decreasing order of preference.
This array defines the preferred targets for spilling when device memory is insufficient. The ordering reflects the policy of spilling in RapidsMPF, where earlier entries are considered more desirable spill destinations.
Definition at line 40 of file memory_type.hpp.