Memory Resources

namespace mr
group memory_resources

Typedefs

using device_resource_ref = cuda::mr::resource_ref<cuda::mr::device_accessible>

Alias for a cuda::mr::resource_ref with the property cuda::mr::device_accessible.

using device_async_resource_ref = cuda::mr::async_resource_ref<cuda::mr::device_accessible>

Alias for a cuda::mr::async_resource_ref with the property cuda::mr::device_accessible.

using host_resource_ref = cuda::mr::resource_ref<cuda::mr::host_accessible>

Alias for a cuda::mr::resource_ref with the property cuda::mr::host_accessible.

using host_async_resource_ref = cuda::mr::async_resource_ref<cuda::mr::host_accessible>

Alias for a cuda::mr::async_resource_ref with the property cuda::mr::host_accessible.

using host_device_resource_ref = cuda::mr::resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible>

Alias for a cuda::mr::resource_ref with the properties cuda::mr::host_accessible and cuda::mr::device_accessible.

using host_device_async_resource_ref = cuda::mr::async_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible>

Alias for a cuda::mr::async_resource_ref with the properties cuda::mr::host_accessible and cuda::mr::device_accessible.

Functions

inline device_memory_resource *get_per_device_resource(cuda_device_id device_id)

Get the resource for the specified device.

Returns a pointer to the device_memory_resource for the specified device. The initial resource is a cuda_memory_resource.

device_id.value() must be in the range [0, cudaGetDeviceCount()), otherwise behavior is undefined.

This function is thread-safe with respect to concurrent calls to set_per_device_resource, get_per_device_resource, get_current_device_resource, and set_current_device_resource. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The returned device_memory_resource should only be used when CUDA device device_id is the current device (e.g. set using cudaSetDevice()). The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.

Parameters:

device_id – The id of the target device

Returns:

Pointer to the current device_memory_resource for device device_id

inline device_memory_resource *set_per_device_resource(cuda_device_id device_id, device_memory_resource *new_mr)

Set the device_memory_resource for the specified device.

If new_mr is not nullptr, sets the memory resource pointer for the device specified by id to new_mr. Otherwise, resets ids resource to the initial cuda_memory_resource.

id.value() must be in the range [0, cudaGetDeviceCount()), otherwise behavior is undefined.

The object pointed to by new_mr must outlive the last use of the resource, otherwise behavior is undefined. It is the caller’s responsibility to maintain the lifetime of the resource object.

This function is thread-safe with respect to concurrent calls to set_per_device_resource, get_per_device_resource, get_current_device_resource, and set_current_device_resource. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The resource passed in new_mr must have been created when device id was the current CUDA device (e.g. set using cudaSetDevice()). The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.

Parameters:
  • device_id – The id of the target device

  • new_mr – If not nullptr, pointer to new device_memory_resource to use as new resource for id

Returns:

Pointer to the previous memory resource for id

inline device_memory_resource *get_current_device_resource()

Get the memory resource for the current device.

Returns a pointer to the resource set for the current device. The initial resource is a cuda_memory_resource.

The “current device” is the device returned by cudaGetDevice.

This function is thread-safe with respect to concurrent calls to set_per_device_resource, get_per_device_resource, get_current_device_resource, and set_current_device_resource. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The returned device_memory_resource should only be used with the current CUDA device. Changing the current device (e.g. using cudaSetDevice()) and then using the returned resource can result in undefined behavior. The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.

Returns:

Pointer to the resource for the current device

inline device_memory_resource *set_current_device_resource(device_memory_resource *new_mr)

Set the memory resource for the current device.

If new_mr is not nullptr, sets the resource pointer for the current device to new_mr. Otherwise, resets the resource to the initial cuda_memory_resource.

The “current device” is the device returned by cudaGetDevice.

The object pointed to by new_mr must outlive the last use of the resource, otherwise behavior is undefined. It is the caller’s responsibility to maintain the lifetime of the resource object.

This function is thread-safe with respect to concurrent calls to set_per_device_resource, get_per_device_resource, get_current_device_resource, and set_current_device_resource. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The resource passed in new_mr must have been created for the current CUDA device. The behavior of a device_memory_resource is undefined if used while the active CUDA device is a different device from the one that was active when the device_memory_resource was created.

Parameters:

new_mr – If not nullptr, pointer to new resource to use for the current device

Returns:

Pointer to the previous resource for the current device

inline device_async_resource_ref get_per_device_resource_ref(cuda_device_id device_id)

Get the device_async_resource_ref for the specified device.

Returns a device_async_resource_ref for the specified device. The initial resource_ref references a cuda_memory_resource.

device_id.value() must be in the range [0, cudaGetDeviceCount()), otherwise behavior is undefined.

This function is thread-safe with respect to concurrent calls to set_per_device_resource_ref, get_per_device_resource_ref, get_current_device_resource_ref, set_current_device_resource_ref and reset_current_device_resource_ref. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The returned device_async_resource_ref should only be used when CUDA device device_id is the current device (e.g. set using cudaSetDevice()). The behavior of a device_async_resource_ref is undefined if used while the active CUDA device is a different device from the one that was active when the memory resource was created.

Parameters:

device_id – The id of the target device

Returns:

The current device_async_resource_ref for device device_id

inline device_async_resource_ref set_per_device_resource_ref(cuda_device_id device_id, device_async_resource_ref new_resource_ref)

Set the device_async_resource_ref for the specified device to new_resource_ref

device_id.value() must be in the range [0, cudaGetDeviceCount()), otherwise behavior is undefined.

The object referenced by new_resource_ref must outlive the last use of the resource, otherwise behavior is undefined. It is the caller’s responsibility to maintain the lifetime of the resource object.

This function is thread-safe with respect to concurrent calls to set_per_device_resource_ref, get_per_device_resource_ref, get_current_device_resource_ref, set_current_device_resource_ref and `reset_current_device_resource_ref. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The resource passed in new_resource_ref must have been created when device device_id was the current CUDA device (e.g. set using cudaSetDevice()). The behavior of a device_async_resource_ref is undefined if used while the active CUDA device is a different device from the one that was active when the memory resource was created.

Parameters:
  • device_id – The id of the target device

  • new_resource_ref – new device_async_resource_ref to use as new resource for device_id

Returns:

The previous device_async_resource_ref for device_id

inline device_async_resource_ref get_current_device_resource_ref()

Get the device_async_resource_ref for the current device.

Returns the device_async_resource_ref set for the current device. The initial resource_ref references a cuda_memory_resource.

The “current device” is the device returned by cudaGetDevice.

This function is thread-safe with respect to concurrent calls to set_per_device_resource_ref, get_per_device_resource_ref, get_current_device_resource_ref, set_current_device_resource_ref and `reset_current_device_resource_ref. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The returned device_async_resource_ref should only be used with the current CUDA device. Changing the current device (e.g. using cudaSetDevice()) and then using the returned resource_ref can result in undefined behavior. The behavior of a device_async_resource_ref is undefined if used while the active CUDA device is a different device from the one that was active when the memory resource was created.

Returns:

device_async_resource_ref active for the current device

inline device_async_resource_ref set_current_device_resource_ref(device_async_resource_ref new_resource_ref)

Set the device_async_resource_ref for the current device.

The “current device” is the device returned by cudaGetDevice.

The object referenced by new_resource_ref must outlive the last use of the resource, otherwise behavior is undefined. It is the caller’s responsibility to maintain the lifetime of the resource object.

This function is thread-safe with respect to concurrent calls to set_per_device_resource_ref, get_per_device_resource_ref, get_current_device_resource_ref, set_current_device_resource_ref and `reset_current_device_resource_ref. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Note

The resource passed in new_resource must have been created for the current CUDA device. The behavior of a device_async_resource_ref is undefined if used while the active CUDA device is a different device from the one that was active when the memory resource was created.

Parameters:

new_resource_ref – New device_async_resource_ref to use for the current device

Returns:

Previous device_async_resource_ref for the current device

inline device_async_resource_ref reset_per_device_resource_ref(cuda_device_id device_id)

Reset the device_async_resource_ref for the specified device to the initial resource.

Resets to a reference to the initial cuda_memory_resource.

device_id.value() must be in the range [0, cudaGetDeviceCount()), otherwise behavior is undefined.

This function is thread-safe with respect to concurrent calls to set_per_device_resource_ref, get_per_device_resource_ref, get_current_device_resource_ref, set_current_device_resource_ref and `reset_current_device_resource_ref. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Parameters:

device_id – The id of the target device

Returns:

Previous device_async_resource_ref for device_id

inline device_async_resource_ref reset_current_device_resource_ref()

Reset the device_async_resource_ref for the current device to the initial resource.

Resets to a reference to the initial cuda_memory_resource. The “current device” is the device returned by cudaGetDevice.

This function is thread-safe with respect to concurrent calls to set_per_device_resource_ref, get_per_device_resource_ref, get_current_device_resource_ref, set_current_device_resource_ref and `reset_current_device_resource_ref. Concurrent calls to any of these functions will result in a valid state, but the order of execution is undefined.

Returns:

Previous device_async_resource_ref for device_id

template<class Resource>
device_async_resource_ref to_device_async_resource_ref_checked(Resource *res)

Convert pointer to memory resource into device_async_resource_ref, checking for nullptr

Template Parameters:

Resource – The type of the memory resource.

Parameters:

res – A pointer to the memory resource.

Throws:

std::logic_error – if the memory resource pointer is null.

Returns:

A device_async_resource_ref to the memory resource.

Variables

template<class Resource, class = void>
constexpr bool is_resource_adaptor = false

Concept to check whether a resource is a resource adaptor by checking for get_upstream_resource.

class pinned_host_memory_resource
#include <pinned_host_memory_resource.hpp>

Memory resource class for allocating pinned host memory.

This class uses CUDA’s cudaHostAlloc to allocate pinned host memory. It implements the cuda::mr::memory_resource and cuda::mr::device_memory_resource concepts, and the cuda::mr::host_accessible and cuda::mr::device_accessible properties.

Public Functions

inline bool operator==(const pinned_host_memory_resource&) const

true if the specified resource is the same type as this resource.

Returns:

true if the specified resource is the same type as this resource.

inline bool operator!=(const pinned_host_memory_resource&) const

true if the specified resource is not the same type as this resource, otherwise false.

Returns:

true if the specified resource is not the same type as this resource, otherwise false.

Public Static Functions

static inline void *allocate(std::size_t bytes, [[maybe_unused]] std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)

Allocates pinned host memory of size at least bytes bytes.

Throws:
  • rmm::out_of_memory – if the requested allocation could not be fulfilled due to to a CUDA out of memory error.

  • rmm::bad_alloc – if the requested allocation could not be fulfilled due to any other reason.

Parameters:
  • bytes – The size, in bytes, of the allocation.

  • alignment – Alignment in bytes. Default alignment is used if unspecified.

Returns:

Pointer to the newly allocated memory.

static inline void deallocate(void *ptr, std::size_t bytes, std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT) noexcept

Deallocate memory pointed to by ptr of size bytes bytes.

Parameters:
  • ptr – Pointer to be deallocated.

  • bytes – Size of the allocation.

  • alignment – Alignment in bytes. Default alignment is used if unspecified.

static inline void *allocate_async(std::size_t bytes, [[maybe_unused]] cuda::stream_ref stream)

Allocates pinned host memory of size at least bytes bytes.

Note

Stream argument is ignored and behavior is identical to allocate.

Throws:
  • rmm::out_of_memory – if the requested allocation could not be fulfilled due to to a CUDA out of memory error.

  • rmm::bad_alloc – if the requested allocation could not be fulfilled due to any other error.

Parameters:
  • bytes – The size, in bytes, of the allocation.

  • stream – CUDA stream on which to perform the allocation (ignored).

Returns:

Pointer to the newly allocated memory.

static inline void *allocate_async(std::size_t bytes, std::size_t alignment, [[maybe_unused]] cuda::stream_ref stream)

Allocates pinned host memory of size at least bytes bytes and alignment alignment.

Note

Stream argument is ignored and behavior is identical to allocate.

Throws:
  • rmm::out_of_memory – if the requested allocation could not be fulfilled due to to a CUDA out of memory error.

  • rmm::bad_alloc – if the requested allocation could not be fulfilled due to any other error.

Parameters:
  • bytes – The size, in bytes, of the allocation.

  • alignment – Alignment in bytes.

  • stream – CUDA stream on which to perform the allocation (ignored).

Returns:

Pointer to the newly allocated memory.

static inline void deallocate_async(void *ptr, std::size_t bytes, [[maybe_unused]] cuda::stream_ref stream) noexcept

Deallocate memory pointed to by ptr of size bytes bytes.

Note

Stream argument is ignored and behavior is identical to deallocate.

Parameters:
  • ptr – Pointer to be deallocated.

  • bytes – Size of the allocation.

  • stream – CUDA stream on which to perform the deallocation (ignored).

static inline void deallocate_async(void *ptr, std::size_t bytes, std::size_t alignment, [[maybe_unused]] cuda::stream_ref stream) noexcept

Deallocate memory pointed to by ptr of size bytes bytes and alignment alignment bytes.

Note

Stream argument is ignored and behavior is identical to deallocate.

Parameters:
  • ptr – Pointer to be deallocated.

  • bytes – Size of the allocation.

  • alignment – Alignment in bytes.

  • stream – CUDA stream on which to perform the deallocation (ignored).

Friends

inline friend void get_property(pinned_host_memory_resource const&, cuda::mr::device_accessible) noexcept

Enables the cuda::mr::device_accessible property.

This property declares that a pinned_host_memory_resource provides device accessible memory

inline friend void get_property(pinned_host_memory_resource const&, cuda::mr::host_accessible) noexcept

Enables the cuda::mr::host_accessible property.

This property declares that a pinned_host_memory_resource provides host accessible memory

group device_memory_resources

Typedefs

using allocate_callback_t = std::function<void*(std::size_t, cuda_stream_view, void*)>

Callback function type used by callback memory resource for allocation.

The signature of the callback function is: `void* allocate_callback_t(std::size_t bytes, cuda_stream_view stream, void* arg);

  • Returns a pointer to an allocation of at least bytes usable immediately on stream. The stream-ordered behavior requirements are identical to device_memory_resource::allocate.

  • This signature is compatible with do_allocate but adds the extra function parameter arg. The arg is provided to the constructor of the callback_memory_resource and will be forwarded along to every invocation of the callback function.

using deallocate_callback_t = std::function<void(void*, std::size_t, cuda_stream_view, void*)>

Callback function type used by callback_memory_resource for deallocation.

The signature of the callback function is: `void deallocate_callback_t(void* ptr, std::size_t bytes, cuda_stream_view stream, void* arg);

  • Deallocates memory pointed to by ptr. bytes specifies the size of the allocation in bytes, and must equal the value of bytes that was passed to the allocate callback function. The stream-ordered behavior requirements are identical to device_memory_resource::deallocate.

  • This signature is compatible with do_deallocate but adds the extra function parameter arg. The arg is provided to the constructor of the callback_memory_resource and will be forwarded along to every invocation of the callback function.

Functions

template<typename T, typename U>
bool operator==(polymorphic_allocator<T> const &lhs, polymorphic_allocator<U> const &rhs)

Compare two polymorphic_allocators for equality.

Two polymorphic_allocators are equal if their underlying memory resources compare equal.

Template Parameters:
  • T – Type of the first allocator

  • U – Type of the second allocator

Parameters:
  • lhs – The first allocator to compare

  • rhs – The second allocator to compare

Returns:

true if the two allocators are equal, false otherwise

template<typename T, typename U>
bool operator!=(polymorphic_allocator<T> const &lhs, polymorphic_allocator<U> const &rhs)

Compare two polymorphic_allocators for inequality.

Two polymorphic_allocators are not equal if their underlying memory resources compare not equal.

Template Parameters:
  • T – Type of the first allocator

  • U – Type of the second allocator

Parameters:
  • lhs – The first allocator to compare

  • rhs – The second allocator to compare

Returns:

true if the two allocators are not equal, false otherwise

template<typename A, typename O>
bool operator==(stream_allocator_adaptor<A> const &lhs, stream_allocator_adaptor<O> const &rhs)

Compare two stream_allocator_adaptors for equality.

Two stream_allocator_adaptors are equal if their underlying allocators compare equal.

Template Parameters:
  • A – Type of the first allocator

  • O – Type of the second allocator

Parameters:
  • lhs – The first allocator to compare

  • rhs – The second allocator to compare

Returns:

true if the two allocators are equal, false otherwise

template<typename A, typename O>
bool operator!=(stream_allocator_adaptor<A> const &lhs, stream_allocator_adaptor<O> const &rhs)

Compare two stream_allocator_adaptors for inequality.

Two stream_allocator_adaptors are not equal if their underlying allocators compare not equal.

Template Parameters:
  • A – Type of the first allocator

  • O – Type of the second allocator

Parameters:
  • lhs – The first allocator to compare

  • rhs – The second allocator to compare

Returns:

true if the two allocators are not equal, false otherwise

template<typename Upstream>
class arena_memory_resource : public rmm::mr::device_memory_resource
#include <arena_memory_resource.hpp>

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.

Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.

GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.

Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.

In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.

This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.

See also

Wilson, P. R., Johnstone, M. S., Neely, M., & Boles, D. (1995, September). Dynamic storage allocation: A survey and critical review. In International Workshop on Memory Management (pp. 1-116). Springer, Berlin, Heidelberg.

See also

Berger, E. D., McKinley, K. S., Blumofe, R. D., & Wilson, P. R. (2000). Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11), 117-128.

See also

Evans, J. (2006, April). A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.

Template Parameters:

Upstream – Memory resource to use for allocating memory for the global arena. Implements rmm::mr::device_memory_resource interface.

Public Functions

inline explicit arena_memory_resource(device_async_resource_ref upstream_mr, std::optional<std::size_t> arena_size = std::nullopt, bool dump_log_on_failure = false)

Construct an arena_memory_resource.

Parameters:
  • upstream_mr – The memory resource from which to allocate blocks for the global arena.

  • arena_size – Size in bytes of the global arena. Defaults to half of the available memory on the current device.

  • dump_log_on_failure – If true, dump memory log when running out of memory.

inline explicit arena_memory_resource(Upstream *upstream_mr, std::optional<std::size_t> arena_size = std::nullopt, bool dump_log_on_failure = false)

Construct an arena_memory_resource.

Throws:

rmm::logic_error – if upstream_mr == nullptr.

Parameters:
  • upstream_mr – The memory resource from which to allocate blocks for the global arena.

  • arena_size – Size in bytes of the global arena. Defaults to half of the available memory on the current device.

  • dump_log_on_failure – If true, dump memory log when running out of memory.

template<typename Upstream>
class binning_memory_resource : public rmm::mr::device_memory_resource
#include <binning_memory_resource.hpp>

Allocates memory from upstream resources associated with bin sizes.

Template Parameters:

UpstreamResource – memory_resource to use for allocations that don’t fall within any configured bin size. Implements rmm::mr::device_memory_resource interface.

Public Functions

inline explicit binning_memory_resource(device_async_resource_ref upstream_resource)

Construct a new binning memory resource object.

Initially has no bins, so simply uses the upstream_resource until bin resources are added with add_bin.

Parameters:

upstream_resource – The upstream memory resource used to allocate bin pools.

inline explicit binning_memory_resource(Upstream *upstream_resource)

Construct a new binning memory resource object.

Initially has no bins, so simply uses the upstream_resource until bin resources are added with add_bin.

Throws:

rmm::logic_error – if upstream_resource is nullptr

Parameters:

upstream_resource – The upstream memory resource used to allocate bin pools.

inline binning_memory_resource(device_async_resource_ref upstream_resource, int8_t min_size_exponent, int8_t max_size_exponent)

Construct a new binning memory resource object with a range of initial bins.

Constructs a new binning memory resource and adds bins backed by fixed_size_memory_resource in the range [2^min_size_exponent, 2^max_size_exponent]. For example if min_size_exponent==18 and max_size_exponent==22, creates bins of sizes 256KiB, 512KiB, 1024KiB, 2048KiB and 4096KiB.

Parameters:
  • upstream_resource – The upstream memory resource used to allocate bin pools.

  • min_size_exponent – The minimum base-2 exponent bin size.

  • max_size_exponent – The maximum base-2 exponent bin size.

inline binning_memory_resource(Upstream *upstream_resource, int8_t min_size_exponent, int8_t max_size_exponent)

Construct a new binning memory resource object with a range of initial bins.

Constructs a new binning memory resource and adds bins backed by fixed_size_memory_resource in the range [2^min_size_exponent, 2^max_size_exponent]. For example if min_size_exponent==18 and max_size_exponent==22, creates bins of sizes 256KiB, 512KiB, 1024KiB, 2048KiB and 4096KiB.

Throws:

rmm::logic_error – if upstream_resource is nullptr

Parameters:
  • upstream_resource – The upstream memory resource used to allocate bin pools.

  • min_size_exponent – The minimum base-2 exponent bin size.

  • max_size_exponent – The maximum base-2 exponent bin size.

~binning_memory_resource() override = default

Destroy the binning_memory_resource and free all memory allocated from the upstream resource.

inline device_async_resource_ref get_upstream_resource() const noexcept

device_async_resource_ref to the upstream resource

Returns:

device_async_resource_ref to the upstream resource

inline void add_bin(std::size_t allocation_size, std::optional<device_async_resource_ref> bin_resource = std::nullopt)

Add a bin allocator to this resource.

Adds bin_resource if provided; otherwise constructs and adds a fixed_size_memory_resource.

This bin will be used for any allocation smaller than allocation_size that is larger than the next smaller bin’s allocation size.

If there is already a bin of the specified size nothing is changed.

This function is not thread safe.

Parameters:
  • allocation_size – The maximum size that this bin allocates

  • bin_resource – The memory resource for the bin

class callback_memory_resource : public rmm::mr::device_memory_resource
#include <callback_memory_resource.hpp>

A device memory resource that uses the provided callbacks for memory allocation and deallocation.

Public Functions

inline callback_memory_resource(allocate_callback_t allocate_callback, deallocate_callback_t deallocate_callback, void *allocate_callback_arg = nullptr, void *deallocate_callback_arg = nullptr) noexcept

Construct a new callback memory resource.

Constructs a callback memory resource that uses the user-provided callbacks allocate_callback for allocation and deallocate_callback for deallocation.

Parameters:
  • allocate_callback – The callback function used for allocation

  • deallocate_callback – The callback function used for deallocation

  • allocate_callback_arg – Additional context passed to allocate_callback. It is the caller’s responsibility to maintain the lifetime of the pointed-to data for the duration of the lifetime of the callback_memory_resource.

  • deallocate_callback_arg – Additional context passed to deallocate_callback. It is the caller’s responsibility to maintain the lifetime of the pointed-to data for the duration of the lifetime of the callback_memory_resource.

callback_memory_resource(callback_memory_resource&&) noexcept = default

Default move constructor.

callback_memory_resource &operator=(callback_memory_resource&&) noexcept = default

Default move assignment operator.

Returns:

callback_memory_resource& Reference to the assigned object

class cuda_async_memory_resource : public rmm::mr::device_memory_resource
#include <cuda_async_memory_resource.hpp>

device_memory_resource derived class that uses cudaMallocAsync/cudaFreeAsync for allocation/deallocation.

Public Types

enum class allocation_handle_type

Flags for specifying memory allocation handle types.

Note

These values are exact copies from cudaMemAllocationHandleType. We need to define our own enum here because the earliest CUDA runtime version that supports asynchronous memory pools (CUDA 11.2) did not support these flags, so we need a placeholder that can be used consistently in the constructor of cuda_async_memory_resource with all versions of CUDA >= 11.2. See the cudaMemAllocationHandleType docs at https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html and ensure the enum values are kept in sync with the CUDA documentation.

Values:

enumerator none

Does not allow any export mechanism.

enumerator posix_file_descriptor

Allows a file descriptor to be used for exporting. Permitted only on POSIX systems.

enumerator win32

Allows a Win32 NT handle to be used for exporting. (HANDLE)

enumerator win32_kmt

Allows a Win32 KMT handle to be used for exporting. (D3DKMT_HANDLE)

enumerator fabric

Allows a fabric handle to be used for exporting. (cudaMemFabricHandle_t)

Public Functions

inline cuda_async_memory_resource(std::optional<std::size_t> initial_pool_size = {}, std::optional<std::size_t> release_threshold = {}, std::optional<allocation_handle_type> export_handle_type = {})

Constructs a cuda_async_memory_resource with the optionally specified initial pool size and release threshold.

If the pool size grows beyond the release threshold, unused memory held by the pool will be released at the next synchronization event.

Throws:

rmm::logic_error – if the CUDA version does not support cudaMallocAsync

Parameters:
  • initial_pool_size – Optional initial size in bytes of the pool. If no value is provided, initial pool size is half of the available GPU memory.

  • release_threshold – Optional release threshold size in bytes of the pool. If no value is provided, the release threshold is set to the total amount of memory on the current device.

  • export_handle_type – Optional cudaMemAllocationHandleType that allocations from this resource should support interprocess communication (IPC). Default is cudaMemHandleTypeNone for no IPC support.

inline cudaMemPool_t pool_handle() const noexcept

Returns the underlying native handle to the CUDA pool.

Returns:

cudaMemPool_t Handle to the underlying CUDA pool

class cuda_async_view_memory_resource : public rmm::mr::device_memory_resource
#include <cuda_async_view_memory_resource.hpp>

device_memory_resource derived class that uses cudaMallocAsync/cudaFreeAsync for allocation/deallocation.

Public Functions

inline cuda_async_view_memory_resource(cudaMemPool_t valid_pool_handle)

Constructs a cuda_async_view_memory_resource which uses an existing CUDA memory pool. The provided pool is not owned by cuda_async_view_memory_resource and must remain valid during the lifetime of the memory resource.

Throws:

rmm::logic_error – if the CUDA version does not support cudaMallocAsync

Parameters:

valid_pool_handle – Handle to a CUDA memory pool which will be used to serve allocation requests.

inline cudaMemPool_t pool_handle() const noexcept

Returns the underlying native handle to the CUDA pool.

Returns:

cudaMemPool_t Handle to the underlying CUDA pool

cuda_async_view_memory_resource(cuda_async_view_memory_resource const&) = default

Default copy constructor.

cuda_async_view_memory_resource(cuda_async_view_memory_resource&&) = default

Default move constructor.

cuda_async_view_memory_resource &operator=(cuda_async_view_memory_resource const&) = default

Default copy assignment operator.

Returns:

cuda_async_view_memory_resource& Reference to the assigned object

cuda_async_view_memory_resource &operator=(cuda_async_view_memory_resource&&) = default

Default move assignment operator.

Returns:

cuda_async_view_memory_resource& Reference to the assigned object

class cuda_memory_resource : public rmm::mr::device_memory_resource
#include <cuda_memory_resource.hpp>

device_memory_resource derived class that uses cudaMalloc/Free for allocation/deallocation.

Public Functions

cuda_memory_resource(cuda_memory_resource const&) = default

Default copy constructor.

cuda_memory_resource(cuda_memory_resource&&) = default

Default move constructor.

cuda_memory_resource &operator=(cuda_memory_resource const&) = default

Default copy assignment operator.

Returns:

cuda_memory_resource& Reference to the assigned object

cuda_memory_resource &operator=(cuda_memory_resource&&) = default

Default move assignment operator.

Returns:

cuda_memory_resource& Reference to the assigned object

class device_memory_resource
#include <device_memory_resource.hpp>

Base class for all libcudf device memory allocation.

This class serves as the interface that all custom device memory implementations must satisfy.

There are two private, pure virtual functions that all derived classes must implement: do_allocate and do_deallocate. Optionally, derived classes may also override is_equal. By default, is_equal simply performs an identity comparison.

The public, non-virtual functions allocate, deallocate, and is_equal simply call the private virtual functions. The reason for this is to allow implementing shared, default behavior in the base class. For example, the base class’ allocate function may log every allocation, no matter what derived class implementation is used.

The allocate and deallocate APIs and implementations provide stream-ordered memory allocation. This allows optimizations such as re-using memory deallocated on the same stream without the overhead of stream synchronization.

A call to allocate(bytes, stream_a) (on any derived class) returns a pointer that is valid to use on stream_a. Using the memory on a different stream (say stream_b) is Undefined Behavior unless the two streams are first synchronized, for example by using cudaStreamSynchronize(stream_a) or by recording a CUDA event on stream_a and then calling cudaStreamWaitEvent(stream_b, event).

The stream specified to deallocate() should be a stream on which it is valid to use the deallocated memory immediately for another allocation. Typically this is the stream on which the allocation was last used before the call to deallocate(). The passed stream may be used internally by a device_memory_resource for managing available memory with minimal synchronization, and it may also be synchronized at a later time, for example using a call to cudaStreamSynchronize().

For this reason, it is Undefined Behavior to destroy a CUDA stream that is passed to deallocate(). If the stream on which the allocation was last used has been destroyed before calling deallocate() or it is known that it will be destroyed, it is likely better to synchronize the stream (before destroying it) and then pass a different stream to deallocate() (e.g. the default stream).

A device_memory_resource should only be used when the active CUDA device is the same device that was active when the device_memory_resource was created. Otherwise behavior is undefined.

Creating a device_memory_resource for each device requires care to set the current device before creating each resource, and to maintain the lifetime of the resources as long as they are set as per-device resources. Here is an example loop that creates unique_ptrs to pool_memory_resource objects for each device and sets them as the per-device resource for that device.

using pool_mr = rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>;
std::vector<unique_ptr<pool_mr>> per_device_pools;
for(int i = 0; i < N; ++i) {
  cudaSetDevice(i);
  // Note: for brevity, omitting creation of upstream and computing initial_size
  per_device_pools.push_back(std::make_unique<pool_mr>(upstream, initial_size));
  set_per_device_resource(cuda_device_id{i}, &per_device_pools.back());
}

Subclassed by rmm::mr::aligned_resource_adaptor< Upstream >, rmm::mr::arena_memory_resource< Upstream >, rmm::mr::binning_memory_resource< Upstream >, rmm::mr::callback_memory_resource, rmm::mr::cuda_async_memory_resource, rmm::mr::cuda_async_view_memory_resource, rmm::mr::cuda_memory_resource, rmm::mr::failure_callback_resource_adaptor< Upstream, ExceptionType >, rmm::mr::limiting_resource_adaptor< Upstream >, rmm::mr::logging_resource_adaptor< Upstream >, rmm::mr::managed_memory_resource, rmm::mr::owning_wrapper< Resource, Upstreams >, rmm::mr::prefetch_resource_adaptor< Upstream >, rmm::mr::sam_headroom_memory_resource, rmm::mr::statistics_resource_adaptor< Upstream >, rmm::mr::system_memory_resource, rmm::mr::thread_safe_resource_adaptor< Upstream >, rmm::mr::tracking_resource_adaptor< Upstream >

Public Functions

device_memory_resource(device_memory_resource const&) = default

Default copy constructor.

device_memory_resource(device_memory_resource&&) noexcept = default

Default move constructor.

device_memory_resource &operator=(device_memory_resource const&) = default

Default copy assignment operator.

Returns:

device_memory_resource& Reference to the assigned object

device_memory_resource &operator=(device_memory_resource&&) noexcept = default

Default move assignment operator.

Returns:

device_memory_resource& Reference to the assigned object

inline void *allocate(std::size_t bytes, cuda_stream_view stream = cuda_stream_view{})

Allocates memory of size at least bytes.

The returned pointer will have at minimum 256 byte alignment.

If supported, this operation may optionally be executed on a stream. Otherwise, the stream is ignored and the null stream is used.

Throws:

rmm::bad_alloc – When the requested bytes cannot be allocated on the specified stream.

Parameters:
  • bytes – The size of the allocation

  • stream – Stream on which to perform allocation

Returns:

void* Pointer to the newly allocated memory

inline void deallocate(void *ptr, std::size_t bytes, cuda_stream_view stream = cuda_stream_view{})

Deallocate memory pointed to by p.

p must have been returned by a prior call to allocate(bytes, stream) on a device_memory_resource that compares equal to *this, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.

If supported, this operation may optionally be executed on a stream. Otherwise, the stream is ignored and the null stream is used.

Parameters:
  • ptr – Pointer to be deallocated

  • bytes – The size in bytes of the allocation. This must be equal to the value of bytes that was passed to the allocate call that returned p.

  • stream – Stream on which to perform deallocation

inline bool is_equal(device_memory_resource const &other) const noexcept

Compare this resource to another.

Two device_memory_resources compare equal if and only if memory allocated from one device_memory_resource can be deallocated from the other and vice versa.

By default, simply checks if *this and other refer to the same object, i.e., does not check if they are two objects of the same class.

Parameters:

other – The other resource to compare to

Returns:

If the two resources are equivalent

inline void *allocate(std::size_t bytes, std::size_t alignment)

Allocates memory of size at least bytes.

The returned pointer will have at minimum 256 byte alignment.

Throws:

rmm::bad_alloc – When the requested bytes cannot be allocated on the specified stream.

Parameters:
  • bytes – The size of the allocation

  • alignment – The expected alignment of the allocation

Returns:

void* Pointer to the newly allocated memory

inline void deallocate(void *ptr, std::size_t bytes, std::size_t alignment)

Deallocate memory pointed to by p.

p must have been returned by a prior call to allocate(bytes, stream) on a device_memory_resource that compares equal to *this, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.

Parameters:
  • ptr – Pointer to be deallocated

  • bytes – The size in bytes of the allocation. This must be equal to the value of bytes that was passed to the allocate call that returned p.

  • alignment – The alignment that was passed to the allocate call that returned p

inline void *allocate_async(std::size_t bytes, std::size_t alignment, cuda_stream_view stream)

Allocates memory of size at least bytes.

The returned pointer will have at minimum 256 byte alignment.

Throws:

rmm::bad_alloc – When the requested bytes cannot be allocated on the specified stream.

Parameters:
  • bytes – The size of the allocation

  • alignment – The expected alignment of the allocation

  • stream – Stream on which to perform allocation

Returns:

void* Pointer to the newly allocated memory

inline void *allocate_async(std::size_t bytes, cuda_stream_view stream)

Allocates memory of size at least bytes.

The returned pointer will have at minimum 256 byte alignment.

Throws:

rmm::bad_alloc – When the requested bytes cannot be allocated on the specified stream.

Parameters:
  • bytes – The size of the allocation

  • stream – Stream on which to perform allocation

Returns:

void* Pointer to the newly allocated memory

inline void deallocate_async(void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view stream)

Deallocate memory pointed to by p.

p must have been returned by a prior call to allocate(bytes, stream) on a device_memory_resource that compares equal to *this, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.

Parameters:
  • ptr – Pointer to be deallocated

  • bytes – The size in bytes of the allocation. This must be equal to the value of bytes that was passed to the allocate call that returned p.

  • alignment – The alignment that was passed to the allocate call that returned p

  • stream – Stream on which to perform allocation

inline void deallocate_async(void *ptr, std::size_t bytes, cuda_stream_view stream)

Deallocate memory pointed to by p.

p must have been returned by a prior call to allocate(bytes, stream) on a device_memory_resource that compares equal to *this, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.

Parameters:
  • ptr – Pointer to be deallocated

  • bytes – The size in bytes of the allocation. This must be equal to the value of bytes that was passed to the allocate call that returned p.

  • stream – Stream on which to perform allocation

inline bool operator==(device_memory_resource const &other) const noexcept

Comparison operator with another device_memory_resource.

Parameters:

other – The other resource to compare to

Returns:

true If the two resources are equivalent

Returns:

false If the two resources are not equivalent

inline bool operator!=(device_memory_resource const &other) const noexcept

Comparison operator with another device_memory_resource.

Parameters:

other – The other resource to compare to

Returns:

false If the two resources are equivalent

Returns:

true If the two resources are not equivalent

Friends

inline friend void get_property(device_memory_resource const&, cuda::mr::device_accessible) noexcept

Enables the cuda::mr::device_accessible property.

This property declares that a device_memory_resource provides device accessible memory

template<typename Upstream>
class fixed_size_memory_resource : public detail::stream_ordered_memory_resource<fixed_size_memory_resource<Upstream>, detail::fixed_size_free_list>
#include <fixed_size_memory_resource.hpp>

A device_memory_resource which allocates memory blocks of a single fixed size.

Supports only allocations of size smaller than the configured block_size.

Public Functions

inline explicit fixed_size_memory_resource(device_async_resource_ref upstream_mr, std::size_t block_size = default_block_size, std::size_t blocks_to_preallocate = default_blocks_to_preallocate)

Construct a new fixed_size_memory_resource that allocates memory from upstream_mr.

When the pool of blocks is all allocated, grows the pool by allocating blocks_to_preallocate more blocks from upstream_mr.

Parameters:
  • upstream_mr – The device_async_resource_ref from which to allocate blocks for the pool.

  • block_size – The size of blocks to allocate.

  • blocks_to_preallocate – The number of blocks to allocate to initialize the pool.

inline explicit fixed_size_memory_resource(Upstream *upstream_mr, std::size_t block_size = default_block_size, std::size_t blocks_to_preallocate = default_blocks_to_preallocate)

Construct a new fixed_size_memory_resource that allocates memory from upstream_mr.

When the pool of blocks is all allocated, grows the pool by allocating blocks_to_preallocate more blocks from upstream_mr.

Parameters:
  • upstream_mr – The memory_resource from which to allocate blocks for the pool.

  • block_size – The size of blocks to allocate.

  • blocks_to_preallocate – The number of blocks to allocate to initialize the pool.

inline ~fixed_size_memory_resource() override

Destroy the fixed_size_memory_resource and free all memory allocated from upstream.

inline device_async_resource_ref get_upstream_resource() const noexcept

device_async_resource_ref to the upstream resource

Returns:

device_async_resource_ref to the upstream resource

inline std::size_t get_block_size() const noexcept

Get the size of blocks allocated by this memory resource.

Returns:

std::size_t size in bytes of allocated blocks.

Public Static Attributes

static constexpr std::size_t default_block_size = 1 << 20

Default allocation block size.

static constexpr std::size_t default_blocks_to_preallocate = 128

The number of blocks that the pool starts out with, and also the number of blocks by which the pool grows when all of its current blocks are allocated

class managed_memory_resource : public rmm::mr::device_memory_resource
#include <managed_memory_resource.hpp>

device_memory_resource derived class that uses cudaMallocManaged/Free for allocation/deallocation.

Public Functions

managed_memory_resource(managed_memory_resource const&) = default

Default copy constructor.

managed_memory_resource(managed_memory_resource&&) = default

Default move constructor.

managed_memory_resource &operator=(managed_memory_resource const&) = default

Default copy assignment operator.

Returns:

managed_memory_resource& Reference to the assigned object

managed_memory_resource &operator=(managed_memory_resource&&) = default

Default move assignment operator.

Returns:

managed_memory_resource& Reference to the assigned object

template<typename T>
class polymorphic_allocator
#include <polymorphic_allocator.hpp>

A stream ordered Allocator using a rmm::mr::device_memory_resource to satisfy (de)allocations.

Similar to std::pmr::polymorphic_allocator, uses the runtime polymorphism of device_memory_resource to allow containers with polymorphic_allocator as their static allocator type to be interoperable, but exhibit different behavior depending on resource used.

Unlike STL allocators, polymorphic_allocator’s allocate and deallocate functions are stream ordered. Use stream_allocator_adaptor to allow interoperability with interfaces that require standard, non stream-ordered Allocator interfaces.

Template Parameters:

T – The allocators value type.

Public Types

using value_type = T

T, the value type of objects allocated by this allocator.

Public Functions

polymorphic_allocator() = default

Construct a polymorphic_allocator using the return value of rmm::mr::get_current_device_resource_ref() as the underlying memory resource.

inline polymorphic_allocator(device_async_resource_ref mr)

Construct a polymorphic_allocator using the provided memory resource.

This constructor provides an implicit conversion from device_async_resource_ref.

Parameters:

mr – The upstream memory resource to use for allocation.

template<typename U>
inline polymorphic_allocator(polymorphic_allocator<U> const &other) noexcept

Construct a polymorphic_allocator using the underlying memory resource of other.

Parameters:

other – The polymorphic_allocator whose memory resource will be used as the underlying resource of the new polymorphic_allocator.

inline value_type *allocate(std::size_t num, cuda_stream_view stream)

Allocates storage for num objects of type T using the underlying memory resource.

Parameters:
  • num – The number of objects to allocate storage for

  • stream – The stream on which to perform the allocation

Returns:

Pointer to the allocated storage

inline void deallocate(value_type *ptr, std::size_t num, cuda_stream_view stream)

Deallocates storage pointed to by ptr.

ptr must have been allocated from a memory resource r that compares equal to get_upstream_resource() using r.allocate(n * sizeof(T)).

Parameters:
  • ptr – Pointer to memory to deallocate

  • num – Number of objects originally allocated

  • stream – Stream on which to perform the deallocation

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

template<typename Allocator>
class stream_allocator_adaptor
#include <polymorphic_allocator.hpp>

Adapts a stream ordered allocator to provide a standard Allocator interface.

A stream-ordered allocator (i.e., allocate/deallocate use a cuda_stream_view) cannot be used in an interface that expects a standard C++ Allocator interface. stream_allocator_adaptor wraps a stream-ordered allocator and a stream to provide a standard Allocator interface. The adaptor uses the wrapped stream in calls to the underlying allocator’s allocate and deallocate functions.

Example:

my_stream_ordered_allocator<int> a{...};
cuda_stream_view s = // create stream;

auto adapted = stream_allocator_adaptor(a, s);

// Allocates storage for `n` int's on stream `s`
int * p = std::allocator_traits<decltype(adapted)>::allocate(adapted, n);

Template Parameters:

Allocator – Stream ordered allocator type to adapt

Public Types

using value_type = typename std::allocator_traits<Allocator>::value_type

The value type of objects allocated by this allocator

Public Functions

inline stream_allocator_adaptor(Allocator const &allocator, cuda_stream_view stream)

Construct a stream_allocator_adaptor using a as the underlying allocator.

Note

: The stream must not be destroyed before the stream_allocator_adaptor, otherwise behavior is undefined.

Parameters:
  • allocator – The stream ordered allocator to use as the underlying allocator

  • stream – The stream used with the underlying allocator

template<typename OtherAllocator>
inline stream_allocator_adaptor(stream_allocator_adaptor<OtherAllocator> const &other)

Construct a stream_allocator_adaptor using other.underlying_allocator() and other.stream() as the underlying allocator and stream.

Template Parameters:

OtherAllocator – Type of other’s underlying allocator

Parameters:

other – The other stream_allocator_adaptor whose underlying allocator and stream will be copied

inline value_type *allocate(std::size_t num)

Allocates storage for num objects of type T using the underlying allocator on stream().

Parameters:

num – The number of objects to allocate storage for

Returns:

Pointer to the allocated storage

inline void deallocate(value_type *ptr, std::size_t num)

Deallocates storage pointed to by ptr using the underlying allocator on stream().

ptr must have been allocated from by an allocator a that compares equal to underlying_allocator() using a.allocate(n).

Parameters:
  • ptr – Pointer to memory to deallocate

  • num – Number of objects originally allocated

inline cuda_stream_view stream() const noexcept

The stream on which calls to the underlying allocator are made.

Returns:

The stream on which calls to the underlying allocator are made

inline Allocator underlying_allocator() const noexcept

The underlying allocator.

Returns:

The underlying allocator

template<typename T>
struct rebind
#include <polymorphic_allocator.hpp>

Rebinds the allocator to the specified type.

Template Parameters:

T – The desired value_type of the rebound allocator type

Public Types

using other = stream_allocator_adaptor<typename std::allocator_traits<Allocator>::template rebind_alloc<T>>

The type to bind to.

template<typename Upstream>
class pool_memory_resource : public detail::maybe_remove_property<pool_memory_resource<Upstream>, Upstream, cuda::mr::device_accessible>, public detail::stream_ordered_memory_resource<pool_memory_resource<Upstream>, detail::coalescing_free_list>, public cuda::forward_property<pool_memory_resource<Upstream>, Upstream>
#include <pool_memory_resource.hpp>

A coalescing best-fit suballocator which uses a pool of memory allocated from an upstream memory_resource.

Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.

Template Parameters:

UpstreamResource – memory_resource to use for allocating the pool. Implements rmm::mr::device_memory_resource interface.

Public Functions

inline explicit pool_memory_resource(device_async_resource_ref upstream_mr, std::size_t initial_pool_size, std::optional<std::size_t> maximum_pool_size = std::nullopt)

Construct a pool_memory_resource and allocate the initial device memory pool using upstream_mr.

Throws:
  • rmm::logic_error – if initial_pool_size is not aligned to a multiple of pool_memory_resource::allocation_alignment bytes.

  • rmm::logic_error – if maximum_pool_size is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.

Parameters:
  • upstream_mr – The memory_resource from which to allocate blocks for the pool.

  • initial_pool_size – Minimum size, in bytes, of the initial pool.

  • maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available from the upstream resource.

inline explicit pool_memory_resource(Upstream *upstream_mr, std::size_t initial_pool_size, std::optional<std::size_t> maximum_pool_size = std::nullopt)

Construct a pool_memory_resource and allocate the initial device memory pool using upstream_mr.

Throws:
  • rmm::logic_error – if upstream_mr == nullptr

  • rmm::logic_error – if initial_pool_size is not aligned to a multiple of pool_memory_resource::allocation_alignment bytes.

  • rmm::logic_error – if maximum_pool_size is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.

Parameters:
  • upstream_mr – The memory_resource from which to allocate blocks for the pool.

  • initial_pool_size – Minimum size, in bytes, of the initial pool.

  • maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available from the upstream resource.

template<typename Upstream2 = Upstream, cuda::std::enable_if_t<cuda::mr::async_resource<Upstream2>, int> = 0>
inline explicit pool_memory_resource(Upstream2 &upstream_mr, std::size_t initial_pool_size, std::optional<std::size_t> maximum_pool_size = std::nullopt)

Construct a pool_memory_resource and allocate the initial device memory pool using upstream_mr.

Throws:
  • rmm::logic_error – if upstream_mr == nullptr

  • rmm::logic_error – if initial_pool_size is not aligned to a multiple of pool_memory_resource::allocation_alignment bytes.

  • rmm::logic_error – if maximum_pool_size is neither the default nor aligned to a multiple of pool_memory_resource::allocation_alignment bytes.

Parameters:
  • upstream_mr – The memory_resource from which to allocate blocks for the pool.

  • initial_pool_size – Minimum size, in bytes, of the initial pool.

  • maximum_pool_size – Maximum size, in bytes, that the pool can grow to. Defaults to all of the available memory from the upstream resource.

inline ~pool_memory_resource() override

Destroy the pool_memory_resource and deallocate all memory it allocated using the upstream resource.

inline device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

inline std::size_t pool_size() const noexcept

Computes the size of the current pool.

Includes allocated as well as free memory.

Returns:

std::size_t The total size of the currently allocated pool.

class sam_headroom_memory_resource : public rmm::mr::device_memory_resource
#include <sam_headroom_memory_resource.hpp>

Resource that uses system memory resource to allocate memory with a headroom.

System allocated memory (SAM) can be migrated to the GPU, but is never migrated back the host. If GPU memory is over-subscribed, this can cause other CUDA calls to fail with out-of-memory errors. To work around this problem, when using a system memory resource, we reserve some GPU memory as headroom for other CUDA calls, and only conditionally set its preferred location to the GPU if the allocation would not eat into the headroom.

Since doing this check on every allocation can be expensive, the caller may choose to use other allocators (e.g. binning_memory_resource) for small allocations, and use this allocator for large allocations only.

Public Functions

inline explicit sam_headroom_memory_resource(std::size_t headroom)

Construct a headroom memory resource.

Parameters:

headroom – Size of the reserved GPU memory as headroom

class system_memory_resource : public rmm::mr::device_memory_resource
#include <system_memory_resource.hpp>

device_memory_resource derived class that uses malloc/free for allocation/deallocation.

There are two flavors of hardware/software environments that support accessing system allocated memory (SAM) from the GPU: HMM and ATS.

Heterogeneous Memory Management (HMM) is a software-based solution for PCIe-connected GPUs on x86 systems. Requirements:

  • NVIDIA CUDA 12.2 with the open-source r535_00 driver or newer.

  • A sufficiently recent Linux kernel: 6.1.24+, 6.2.11+, or 6.3+.

  • A GPU with one of the following supported architectures: NVIDIA Turing, NVIDIA Ampere, NVIDIA Ada Lovelace, NVIDIA Hopper, or newer.

  • A 64-bit x86 CPU.

For more information, see https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/.

Address Translation Services (ATS) is a hardware/software solution for the Grace Hopper Superchip that uses the NVLink Chip-2-Chip (C2C) interconnect to provide coherent memory. For more information, see https://developer.nvidia.com/blog/nvidia-grace-hopper-superchip-architecture-in-depth/.

Public Functions

system_memory_resource(system_memory_resource const&) = default

Default copy constructor.

system_memory_resource(system_memory_resource&&) = default

Default copy constructor.

system_memory_resource &operator=(system_memory_resource const&) = default

Default copy assignment operator.

Returns:

system_memory_resource& Reference to the assigned object

system_memory_resource &operator=(system_memory_resource&&) = default

Default move assignment operator.

Returns:

system_memory_resource& Reference to the assigned object

group host_memory_resources
class host_memory_resource
#include <host_memory_resource.hpp>

Base class for host memory allocation.

This is based on std::pmr::memory_resource: https://en.cppreference.com/w/cpp/memory/memory_resource

When C++17 is available for use in RMM, rmm::host_memory_resource should inherit from std::pmr::memory_resource.

This class serves as the interface that all host memory resource implementations must satisfy.

There are two private, pure virtual functions that all derived classes must implement: do_allocate and do_deallocate. Optionally, derived classes may also override is_equal. By default, is_equal simply performs an identity comparison.

The public, non-virtual functions allocate, deallocate, and is_equal simply call the private virtual functions. The reason for this is to allow implementing shared, default behavior in the base class. For example, the base class’ allocate function may log every allocation, no matter what derived class implementation is used.

Subclassed by rmm::mr::new_delete_resource, rmm::mr::pinned_memory_resource

Public Functions

host_memory_resource(host_memory_resource const&) = default

Default copy constructor.

host_memory_resource(host_memory_resource&&) noexcept = default

Default move constructor.

host_memory_resource &operator=(host_memory_resource const&) = default

Default copy assignment operator.

Returns:

host_memory_resource& Reference to the assigned object

host_memory_resource &operator=(host_memory_resource&&) noexcept = default

Default move assignment operator.

Returns:

host_memory_resource& Reference to the assigned object

inline void *allocate(std::size_t bytes, std::size_t alignment = alignof(std::max_align_t))

Allocates memory on the host of size at least bytes bytes.

The returned storage is aligned to the specified alignment if supported, and to alignof(std::max_align_t) otherwise.

Throws:

std::bad_alloc – When the requested bytes and alignment cannot be allocated.

Parameters:
  • bytes – The size of the allocation

  • alignment – Alignment of the allocation

Returns:

void* Pointer to the newly allocated memory

inline void deallocate(void *ptr, std::size_t bytes, std::size_t alignment = alignof(std::max_align_t))

Deallocate memory pointed to by ptr.

ptr must have been returned by a prior call to allocate(bytes,alignment) on a host_memory_resource that compares equal to *this, and the storage it points to must not yet have been deallocated, otherwise behavior is undefined.

Parameters:
  • ptr – Pointer to be deallocated

  • bytes – The size in bytes of the allocation. This must be equal to the value of bytes that was passed to the allocate call that returned ptr.

  • alignment – Alignment of the allocation. This must be equal to the value of alignment that was passed to the allocate call that returned ptr.

inline bool is_equal(host_memory_resource const &other) const noexcept

Compare this resource to another.

Two host_memory_resources compare equal if and only if memory allocated from one host_memory_resource can be deallocated from the other and vice versa.

By default, simply checks if *this and other refer to the same object, i.e., does not check if they are two objects of the same class.

Parameters:

other – The other resource to compare to

Returns:

true if the two resources are equivalent

inline bool operator==(host_memory_resource const &other) const noexcept

Comparison operator with another device_memory_resource.

Parameters:

other – The other resource to compare to

Returns:

true If the two resources are equivalent

Returns:

false If the two resources are not equivalent

inline bool operator!=(host_memory_resource const &other) const noexcept

Comparison operator with another device_memory_resource.

Parameters:

other – The other resource to compare to

Returns:

false If the two resources are equivalent

Returns:

true If the two resources are not equivalent

Friends

inline friend void get_property(host_memory_resource const&, cuda::mr::host_accessible) noexcept

Enables the cuda::mr::host_accessible property.

This property declares that a host_memory_resource provides host accessible memory

class new_delete_resource : public rmm::mr::host_memory_resource
#include <new_delete_resource.hpp>

A host_memory_resource that uses the global operator new and operator delete to allocate host memory.

Public Functions

new_delete_resource(new_delete_resource const&) = default

Default copy constructor.

new_delete_resource(new_delete_resource&&) = default

Default move constructor.

new_delete_resource &operator=(new_delete_resource const&) = default

Default copy assignment operator.

Returns:

new_delete_resource& Reference to the assigned object

new_delete_resource &operator=(new_delete_resource&&) = default

Default move assignment operator.

Returns:

new_delete_resource& Reference to the assigned object

class pinned_memory_resource : public rmm::mr::host_memory_resource
#include <pinned_memory_resource.hpp>

A host_memory_resource that uses cudaMallocHost to allocate pinned/page-locked host memory.

See https://devblogs.nvidia.com/how-optimize-data-transfers-cuda-cc/

Public Functions

pinned_memory_resource(pinned_memory_resource const&) = default

Default copy constructor.

pinned_memory_resource(pinned_memory_resource&&) = default

Default move constructor.

pinned_memory_resource &operator=(pinned_memory_resource const&) = default

Default copy assignment operator.

Returns:

pinned_memory_resource& Reference to the assigned object

pinned_memory_resource &operator=(pinned_memory_resource&&) = default

Default move assignment operator.

Returns:

pinned_memory_resource& Reference to the assigned object

inline void *allocate_async(std::size_t bytes, std::size_t alignment, cuda_stream_view)

Pretend to support the allocate_async interface, falling back to stream 0.

Throws:

rmm::bad_alloc – When the requested bytes cannot be allocated on the specified stream.

Parameters:
  • bytes – The size of the allocation

  • alignment – The expected alignment of the allocation

Returns:

void* Pointer to the newly allocated memory

inline void *allocate_async(std::size_t bytes, cuda_stream_view)

Pretend to support the allocate_async interface, falling back to stream 0.

Throws:

rmm::bad_alloc – When the requested bytes cannot be allocated on the specified stream.

Parameters:

bytes – The size of the allocation

Returns:

void* Pointer to the newly allocated memory

inline void deallocate_async(void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view)

Pretend to support the deallocate_async interface, falling back to stream 0.

Parameters:
  • ptr – Pointer to be deallocated

  • bytes – The size in bytes of the allocation. This must be equal to the value of bytes that was passed to the allocate call that returned p.

  • alignment – The alignment that was passed to the allocate call that returned p

Friends

inline friend void get_property(pinned_memory_resource const&, cuda::mr::device_accessible) noexcept

Enables the cuda::mr::device_accessible property.

This property declares that a pinned_memory_resource provides device accessible memory

group device_resource_adaptors

Typedefs

using failure_callback_t = std::function<bool(std::size_t, void*)>

Callback function type used by failure_callback_resource_adaptor.

The resource adaptor calls this function when a memory allocation throws a specified exception type. The function decides whether the resource adaptor should try to allocate the memory again or re-throw the exception.

The callback function signature is: bool failure_callback_t(std::size_t bytes, void* callback_arg)

The callback function is passed two parameters: bytes is the size of the failed memory allocation and arg is the extra argument passed to the constructor of the failure_callback_resource_adaptor. The callback function returns a Boolean where true means to retry the memory allocation and false means to re-throw the exception.

Functions

template<template<typename...> class Resource, typename ...Upstreams, typename ...Args>
auto make_owning_wrapper(std::tuple<std::shared_ptr<Upstreams>...> upstreams, Args&&... args)

Constructs a resource of type Resource wrapped in an owning_wrapper using upstreams as the upstream resources and args as the additional parameters for the constructor of Resource.

template <typename Upstream1, typename Upstream2>
class example_resource{
  example_resource(Upstream1 * u1, Upstream2 * u2, int n, float f);
};

auto cuda_mr = std::make_shared<rmm::mr::cuda_memory_resource>();
auto cuda_upstreams = std::make_tuple(cuda_mr, cuda_mr);

// Constructs an `example_resource<rmm::mr::cuda_memory_resource, rmm::mr::cuda_memory_resource>`
// wrapped by an `owning_wrapper` taking shared ownership of `cuda_mr` and using it as both of
// `example_resource`s upstream resources. Forwards the  arguments `42` and `3.14` to the
// additional `n` and `f` arguments of `example_resource` constructor.
auto wrapped_example = rmm::mr::make_owning_wrapper<example_resource>(cuda_upstreams, 42, 3.14);
Template Parameters:
  • Resource – Template template parameter specifying the type of the wrapped resource to construct

  • Upstreams – Types of the upstream resources

  • Args – Types of the arguments used in Resources constructor

Parameters:
  • upstreams – Tuple of std::shared_ptrs to the upstreams used by the wrapped resource, in the same order as expected by Resources constructor.

  • args – Function parameter pack of arguments to forward to the wrapped resource’s constructor

Returns:

An owning_wrapper wrapping a newly constructed Resource<Upstreams...> and upstreams.

template<template<typename> class Resource, typename Upstream, typename ...Args>
auto make_owning_wrapper(std::shared_ptr<Upstream> upstream, Args&&... args)

Additional convenience factory for owning_wrapper when Resource has only a single upstream resource.

When a resource has only a single upstream, it can be inconvenient to construct a std::tuple of the upstream resource. This factory allows specifying the single upstream as just a std::shared_ptr.

Template Parameters:
  • Resource – Type of the wrapped resource to construct

  • Upstream – Type of the single upstream resource

  • Args – Types of the arguments used in Resources constructor

Parameters:
  • upstreamstd::shared_ptr to the upstream resource

  • args – Function parameter pack of arguments to forward to the wrapped resource’s constructor

Returns:

An owning_wrapper wrapping a newly construct Resource<Upstream> and upstream.

template<typename Upstream>
class aligned_resource_adaptor : public rmm::mr::device_memory_resource
#include <aligned_resource_adaptor.hpp>

Resource that adapts Upstream memory resource to allocate memory in a specified alignment size.

An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests. This adaptor wraps allocations and deallocations from Upstream using the given alignment size.

By default, any address returned by one of the memory allocation routines from the CUDA driver or runtime API is always aligned to at least 256 bytes. For some use cases, such as GPUDirect Storage (GDS), allocations need to be aligned to a larger size (4 KiB for GDS) in order to avoid additional copies to bounce buffers.

Since a larger alignment size has some additional overhead, the user can specify a threshold size. If an allocation’s size falls below the threshold, it is aligned to the default size. Only allocations with a size above the threshold are aligned to the custom alignment size.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Functions

inline explicit aligned_resource_adaptor(device_async_resource_ref upstream, std::size_t alignment = rmm::CUDA_ALLOCATION_ALIGNMENT, std::size_t alignment_threshold = default_alignment_threshold)

Construct an aligned resource adaptor using upstream to satisfy allocation requests.

Throws:

rmm::logic_error – if allocation_alignment is not a power of 2

Parameters:
  • upstream – The resource used for allocating/deallocating device memory.

  • alignment – The size used for allocation alignment.

  • alignment_threshold – Only allocations with a size larger than or equal to this threshold are aligned.

inline explicit aligned_resource_adaptor(Upstream *upstream, std::size_t alignment = rmm::CUDA_ALLOCATION_ALIGNMENT, std::size_t alignment_threshold = default_alignment_threshold)

Construct an aligned resource adaptor using upstream to satisfy allocation requests.

Throws:
Parameters:
  • upstream – The resource used for allocating/deallocating device memory.

  • alignment – The size used for allocation alignment.

  • alignment_threshold – Only allocations with a size larger than or equal to this threshold are aligned.

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

Public Static Attributes

static constexpr std::size_t default_alignment_threshold = 0

The default alignment used by the adaptor.

template<typename Upstream, typename ExceptionType = rmm::out_of_memory>
class failure_callback_resource_adaptor : public rmm::mr::device_memory_resource
#include <failure_callback_resource_adaptor.hpp>

A device memory resource that calls a callback function when allocations throw a specified exception type.

An instance of this resource must be constructed with an existing, upstream resource in order to satisfy allocation requests.

The callback function takes an allocation size and a callback argument and returns a bool representing whether to retry the allocation (true) or re-throw the caught exception (false).

When implementing a callback function for allocation retry, care must be taken to avoid an infinite loop. The following example makes sure to only retry the allocation once:

using failure_callback_adaptor =
  rmm::mr::failure_callback_resource_adaptor<rmm::mr::device_memory_resource>;

bool failure_handler(std::size_t bytes, void* arg)
{
  bool& retried = *reinterpret_cast<bool*>(arg);
  if (!retried) {
    retried = true;
    return true;  // First time we request an allocation retry
  }
  return false;  // Second time we let the adaptor throw std::bad_alloc
}

int main()
{
  bool retried{false};
  failure_callback_adaptor mr{
    rmm::mr::get_current_device_resource_ref(), failure_handler, &retried
  };
  rmm::mr::set_current_device_resource_ref(mr);
}
Template Parameters:
  • Upstream – The type of the upstream resource used for allocation/deallocation.

  • ExceptionType – The type of exception that this adaptor should respond to

Public Types

using exception_type = ExceptionType

The type of exception this object catches/throws.

Public Functions

inline failure_callback_resource_adaptor(device_async_resource_ref upstream, failure_callback_t callback, void *callback_arg)

Construct a new failure_callback_resource_adaptor using upstream to satisfy allocation requests.

See also

failure_callback_t

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • callback – Callback function

  • callback_arg – Extra argument passed to callback

inline failure_callback_resource_adaptor(Upstream *upstream, failure_callback_t callback, void *callback_arg)

Construct a new failure_callback_resource_adaptor using upstream to satisfy allocation requests.

See also

failure_callback_t

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • callback – Callback function

  • callback_arg – Extra argument passed to callback

failure_callback_resource_adaptor(failure_callback_resource_adaptor&&) noexcept = default

Default move constructor.

failure_callback_resource_adaptor &operator=(failure_callback_resource_adaptor&&) noexcept = default

Default move assignment operator.

Returns:

failure_callback_resource_adaptor& Reference to the assigned object

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

template<typename Upstream>
class limiting_resource_adaptor : public rmm::mr::device_memory_resource
#include <limiting_resource_adaptor.hpp>

Resource that uses Upstream to allocate memory and limits the total allocations possible.

An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests, but any existing allocations will be untracked. Atomics are used to make this thread-safe, but note that the get_allocated_bytes may not include in-flight allocations.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Functions

inline limiting_resource_adaptor(device_async_resource_ref upstream, std::size_t allocation_limit, std::size_t alignment = CUDA_ALLOCATION_ALIGNMENT)

Construct a new limiting resource adaptor using upstream to satisfy allocation requests and limiting the total allocation amount possible.

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • allocation_limit – Maximum memory allowed for this allocator

  • alignment – Alignment in bytes for the start of each allocated buffer

inline limiting_resource_adaptor(Upstream *upstream, std::size_t allocation_limit, std::size_t alignment = CUDA_ALLOCATION_ALIGNMENT)

Construct a new limiting resource adaptor using upstream to satisfy allocation requests and limiting the total allocation amount possible.

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • allocation_limit – Maximum memory allowed for this allocator

  • alignment – Alignment in bytes for the start of each allocated buffer

limiting_resource_adaptor(limiting_resource_adaptor&&) noexcept = default

Default move constructor.

limiting_resource_adaptor &operator=(limiting_resource_adaptor&&) noexcept = default

Default move assignment operator.

Returns:

limiting_resource_adaptor& Reference to the assigned object

inline device_async_resource_ref get_upstream_resource() const noexcept

device_async_resource_ref to the upstream resource

Returns:

device_async_resource_ref to the upstream resource

inline std::size_t get_allocated_bytes() const

Query the number of bytes that have been allocated. Note that this can not be used to know how large of an allocation is possible due to both possible fragmentation and also internal page sizes and alignment that is not tracked by this allocator.

Returns:

std::size_t number of bytes that have been allocated through this allocator.

inline std::size_t get_allocation_limit() const

Query the maximum number of bytes that this allocator is allowed to allocate. This is the limit on the allocator and not a representation of the underlying device. The device may not be able to support this limit.

Returns:

std::size_t max number of bytes allowed for this allocator

template<typename Upstream>
class logging_resource_adaptor : public rmm::mr::device_memory_resource
#include <logging_resource_adaptor.hpp>

Resource that uses Upstream to allocate memory and logs information about the requested allocation/deallocations.

An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests and log allocation/deallocation activity.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Functions

inline logging_resource_adaptor(Upstream *upstream, std::string const &filename = get_default_filename(), bool auto_flush = false)

Construct a new logging resource adaptor using upstream to satisfy allocation requests and logging information about each allocation/free to the file specified by filename.

The logfile will be written using CSV formatting.

Clears the contents of filename if it already exists.

Creating multiple logging_resource_adaptors with the same filename will result in undefined behavior.

Throws:
  • rmm::logic_error – if upstream == nullptr

  • spdlog::spdlog_ex – if opening filename failed

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • filename – Name of file to write log info. If not specified, retrieves the file name from the environment variable “RMM_LOG_FILE”.

  • auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.

inline logging_resource_adaptor(Upstream *upstream, std::ostream &stream, bool auto_flush = false)

Construct a new logging resource adaptor using upstream to satisfy allocation requests and logging information about each allocation/free to the ostream specified by stream.

The logfile will be written using CSV formatting.

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • stream – The ostream to write log info.

  • auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.

inline logging_resource_adaptor(Upstream *upstream, std::initializer_list<sink_ptr> sinks, bool auto_flush = false)

Construct a new logging resource adaptor using upstream to satisfy allocation requests and logging information about each allocation/free to the ostream specified by stream.

The logfile will be written using CSV formatting.

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • sinks – A list of logging sinks to which log output will be written.

  • auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.

inline logging_resource_adaptor(device_async_resource_ref upstream, std::string const &filename = get_default_filename(), bool auto_flush = false)

Construct a new logging resource adaptor using upstream to satisfy allocation requests and logging information about each allocation/free to the file specified by filename.

The logfile will be written using CSV formatting.

Clears the contents of filename if it already exists.

Creating multiple logging_resource_adaptors with the same filename will result in undefined behavior.

Throws:

spdlog::spdlog_ex – if opening filename failed

Parameters:
  • upstream – The resource_ref used for allocating/deallocating device memory.

  • filename – Name of file to write log info. If not specified, retrieves the file name from the environment variable “RMM_LOG_FILE”.

  • auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.

inline logging_resource_adaptor(device_async_resource_ref upstream, std::ostream &stream, bool auto_flush = false)

Construct a new logging resource adaptor using upstream to satisfy allocation requests and logging information about each allocation/free to the ostream specified by stream.

The logfile will be written using CSV formatting.

Parameters:
  • upstream – The resource_ref used for allocating/deallocating device memory.

  • stream – The ostream to write log info.

  • auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.

inline logging_resource_adaptor(device_async_resource_ref upstream, std::initializer_list<sink_ptr> sinks, bool auto_flush = false)

Construct a new logging resource adaptor using upstream to satisfy allocation requests and logging information about each allocation/free to the ostream specified by stream.

The logfile will be written using CSV formatting.

Parameters:
  • upstream – The resource_ref used for allocating/deallocating device memory.

  • sinks – A list of logging sinks to which log output will be written.

  • auto_flush – If true, flushes the log for every (de)allocation. Warning, this will degrade performance.

logging_resource_adaptor(logging_resource_adaptor&&) noexcept = default

Default move constructor.

logging_resource_adaptor &operator=(logging_resource_adaptor&&) noexcept = default

Default move assignment operator.

Returns:

logging_resource_adaptor& Reference to the assigned object

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

inline void flush()

Flush logger contents.

inline std::string header() const

Return the CSV header string.

Returns:

CSV formatted header string of column names

Public Static Functions

static inline std::string get_default_filename()

Return the value of the environment variable RMM_LOG_FILE.

Throws:

rmm::logic_error – if RMM_LOG_FILE is not set.

Returns:

The value of RMM_LOG_FILE as std::string.

template<typename Resource, typename ...Upstreams>
class owning_wrapper : public rmm::mr::device_memory_resource
#include <owning_wrapper.hpp>

Resource adaptor that maintains the lifetime of upstream resources.

Many device_memory_resource derived types allocate memory from another “upstream” resource. E.g., pool_memory_resource allocates its pool from an upstream resource. Typically, a resource does not own its upstream, and therefore it is the user’s responsibility to maintain the lifetime of the upstream resource. This can be inconvenient and error prone, especially for resources with complex upstreams that may themselves also have an upstream.

owning_wrapper simplifies lifetime management of a resource, wrapped, by taking shared ownership of all upstream resources via a std::shared_ptr.

For convenience, it is recommended to use the make_owning_wrapper factory instead of constructing an owning_wrapper directly.

Example:

auto cuda = std::make_shared<rmm::mr::cuda_memory_resource>();
auto pool = rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(cuda,initial_pool_size,
                                                                        max_pool_size);
// The `cuda` resource will be kept alive for the lifetime of `pool` and automatically be
// destroyed after `pool` is destroyed

Template Parameters:
  • Resource – Type of the wrapped resource

  • Upstreams – Template parameter pack of the types of the upstream resources used by Resource

Public Types

using upstream_tuple = std::tuple<std::shared_ptr<Upstreams>...>

Tuple of upstream memory resources.

Public Functions

template<typename ...Args>
inline owning_wrapper(upstream_tuple upstreams, Args&&... args)

Constructs the wrapped resource using the provided upstreams and any additional arguments forwarded to the wrapped resources constructor.

Resource is required to have a constructor whose first argument(s) are raw pointers to its upstream resources in the same order as upstreams, followed by any additional arguments in the same order as args.

Example:

template <typename Upstream1, typename Upstream2>
class example_resource{
  example_resource(Upstream1 * u1, Upstream2 * u2, int n, float f);
};

using cuda = rmm::mr::cuda_memory_resource;
using example = example_resource<cuda,cuda>;
using wrapped_example = rmm::mr::owning_wrapper<example, cuda, cuda>;
auto cuda_mr = std::make_shared<cuda>();

// Constructs an `example_resource` wrapped by an `owning_wrapper` taking shared ownership of
//`cuda_mr` and using it as both of `example_resource`s upstream resources. Forwards the
// arguments `42` and `3.14` to the additional `n` and `f` arguments of `example_resources`
// constructor.
wrapped_example w{std::make_tuple(cuda_mr,cuda_mr), 42, 3.14};

Template Parameters:

Args – Template parameter pack to forward to the wrapped resource’s constructor

Parameters:
  • upstreams – Tuple of std::shared_ptrs to the upstreams used by the wrapped resource, in the same order as expected by Resources constructor.

  • args – Function parameter pack of arguments to forward to the wrapped resource’s constructor

inline Resource const &wrapped() const noexcept

A constant reference to the wrapped resource.

Returns:

A constant reference to the wrapped resource

inline Resource &wrapped() noexcept

A reference to the wrapped resource.

Returns:

A reference to the wrapped resource

template<typename Upstream>
class prefetch_resource_adaptor : public rmm::mr::device_memory_resource
#include <prefetch_resource_adaptor.hpp>

Resource that prefetches all memory allocations.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Functions

inline prefetch_resource_adaptor(device_async_resource_ref upstream)

Construct a new prefetch resource adaptor using upstream to satisfy allocation requests.

Parameters:

upstream – The resource_ref used for allocating/deallocating device memory

inline prefetch_resource_adaptor(Upstream *upstream)

Construct a new prefetch resource adaptor using upstream to satisfy allocation requests.

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:

upstream – The resource used for allocating/deallocating device memory

prefetch_resource_adaptor(prefetch_resource_adaptor&&) noexcept = default

Default move constructor.

prefetch_resource_adaptor &operator=(prefetch_resource_adaptor&&) noexcept = default

Default move assignment operator.

Returns:

prefetch_resource_adaptor& Reference to the assigned object

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

template<typename Upstream>
class statistics_resource_adaptor : public rmm::mr::device_memory_resource
#include <statistics_resource_adaptor.hpp>

Resource that uses Upstream to allocate memory and tracks statistics on memory allocations.

An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests, but any existing allocations will be untracked. Tracking statistics stores the current, peak and total memory allocations for both the number of bytes and number of calls to the memory resource.

This resource supports nested statistics, which makes it possible to track statistics of a code block. Use .push_counters() to start tracking statistics on a code block and use .pop_counters() to stop the tracking. The nested statistics are cascading such that the statistics tracked by a code block include the statistics tracked in all its tracked sub code blocks.

statistics_resource_adaptor is intended as a debug adaptor and shouldn’t be used in performance-sensitive code.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Types

using read_lock_t = std::shared_lock<std::shared_mutex>

Type of lock used to synchronize read access.

using write_lock_t = std::unique_lock<std::shared_mutex>

Type of lock used to synchronize write access.

Public Functions

inline statistics_resource_adaptor(device_async_resource_ref upstream)

Construct a new statistics resource adaptor using upstream to satisfy allocation requests.

Parameters:

upstream – The resource_ref used for allocating/deallocating device memory.

inline statistics_resource_adaptor(Upstream *upstream)

Construct a new statistics resource adaptor using upstream to satisfy allocation requests.

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:

upstream – The resource used for allocating/deallocating device memory.

statistics_resource_adaptor(statistics_resource_adaptor&&) noexcept = default

Default move constructor.

statistics_resource_adaptor &operator=(statistics_resource_adaptor&&) noexcept = default

Default move assignment operator.

Returns:

statistics_resource_adaptor& Reference to the assigned object

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

inline counter get_bytes_counter() const noexcept

Returns a counter struct for this adaptor containing the current, peak, and total number of allocated bytes for this adaptor since it was created.

Returns:

counter struct containing bytes count

inline counter get_allocations_counter() const noexcept

Returns a counter struct for this adaptor containing the current, peak, and total number of allocation counts for this adaptor since it was created.

Returns:

counter struct containing allocations count

inline std::pair<counter, counter> push_counters()

Push a pair of zero counters on the stack, which becomes the new counters returned by get_bytes_counter() and get_allocations_counter()

Returns:

top pair of counters <bytes, allocations> from the stack before the push

inline std::pair<counter, counter> pop_counters()

Pop a pair of counters from the stack.

Throws:

std::out_of_range – if the counter stack has fewer than two entries.

Returns:

top pair of counters <bytes, allocations> from the stack before the pop

struct counter
#include <statistics_resource_adaptor.hpp>

Utility struct for counting the current, peak, and total value of a number.

Public Functions

inline counter &operator+=(int64_t val)

Add val to the current value and update the peak value if necessary.

Parameters:

val – Value to add

Returns:

Reference to this object

inline counter &operator-=(int64_t val)

Subtract val from the current value and update the peak value if necessary.

Parameters:

val – Value to subtract

Returns:

Reference to this object

inline void add_counters_from_tracked_sub_block(const counter &val)

Add val to the current value and update the peak value if necessary.

When updating the peak value, we assume that val is tracking a code block inside the code block tracked by this. Because nested statistics are cascading, we have to convert val.peak to the peak it would have been if it was part of the statistics tracked by this. We do this by adding the current value that was active when val started tracking such that we get std::max(value + val.peak, peak).

Parameters:

val – Value to add

Public Members

int64_t value = {0}

Current value.

int64_t peak = {0}

Max value of value

int64_t total = {0}

Sum of all added values.

template<typename Upstream>
class thread_safe_resource_adaptor : public rmm::mr::device_memory_resource
#include <thread_safe_resource_adaptor.hpp>

Resource that adapts Upstream memory resource adaptor to be thread safe.

An instance of this resource can be constructured with an existing, upstream resource in order to satisfy allocation requests. This adaptor wraps allocations and deallocations from Upstream in a mutex lock.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Types

using lock_t = std::lock_guard<std::mutex>

Type of lock used to synchronize access.

Public Functions

inline thread_safe_resource_adaptor(device_async_resource_ref upstream)

Construct a new thread safe resource adaptor using upstream to satisfy allocation requests.

All allocations and frees are protected by a mutex lock

Parameters:

upstream – The resource used for allocating/deallocating device memory.

inline thread_safe_resource_adaptor(Upstream *upstream)

Construct a new thread safe resource adaptor using upstream to satisfy allocation requests.

All allocations and frees are protected by a mutex lock

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:

upstream – The resource used for allocating/deallocating device memory.

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

template<typename T>
class thrust_allocator : public thrust::device_malloc_allocator<T>
#include <thrust_allocator_adaptor.hpp>

An allocator compatible with Thrust containers and algorithms using a device_async_resource_ref for memory (de)allocation.

Unlike a device_async_resource_ref, thrust_allocator is typed and bound to allocate objects of a specific type T, but can be freely rebound to other types.

The allocator records the current cuda device and may only be used with a backing device_async_resource_ref valid for the same device.

Template Parameters:

T – The type of the objects that will be allocated by this allocator

Public Types

using Base = thrust::device_malloc_allocator<T>

The base type of this allocator.

using pointer = typename Base::pointer

The pointer type.

using size_type = typename Base::size_type

The size type.

Public Functions

thrust_allocator() = default

Default constructor creates an allocator using the default memory resource and default stream.

inline explicit thrust_allocator(cuda_stream_view stream)

Constructs a thrust_allocator using the default device memory resource and specified stream.

Parameters:

stream – The stream to be used for device memory (de)allocation

inline thrust_allocator(cuda_stream_view stream, rmm::device_async_resource_ref mr)

Constructs a thrust_allocator using a device memory resource and stream.

Parameters:
  • mr – The resource to be used for device memory allocation

  • stream – The stream to be used for device memory (de)allocation

template<typename U>
inline thrust_allocator(thrust_allocator<U> const &other)

Copy constructor. Copies the resource pointer and stream.

Parameters:

other – The thrust_allocator to copy

inline pointer allocate(size_type num)

Allocate objects of type T

Parameters:

num – The number of elements of type T to allocate

Returns:

pointer Pointer to the newly allocated storage

inline void deallocate(pointer ptr, size_type num)

Deallocates objects of type T

Parameters:
  • ptr – Pointer returned by a previous call to allocate

  • num – number of elements, must be equal to the argument passed to the prior allocate call that produced p

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

inline cuda_stream_view stream() const noexcept

The stream used by this allocator.

Returns:

The stream used by this allocator

Friends

inline friend void get_property(thrust_allocator const&, cuda::mr::device_accessible) noexcept

Enables the cuda::mr::device_accessible property.

This property declares that a thrust_allocator provides device accessible memory

template<typename U>
struct rebind
#include <thrust_allocator_adaptor.hpp>

Provides the type of a thrust_allocator instantiated with another type.

Template Parameters:

U – the other type to use for instantiation

Public Types

using other = thrust_allocator<U>

The type to bind to.

template<typename Upstream>
class tracking_resource_adaptor : public rmm::mr::device_memory_resource
#include <tracking_resource_adaptor.hpp>

Resource that uses Upstream to allocate memory and tracks allocations.

An instance of this resource can be constructed with an existing, upstream resource in order to satisfy allocation requests, but any existing allocations will be untracked. Tracking stores a size and pointer for every allocation, and a stack frame if capture_stacks is true, so it can add significant overhead. tracking_resource_adaptor is intended as a debug adaptor and shouldn’t be used in performance-sensitive code. Note that callstacks may not contain all symbols unless the project is linked with -rdynamic. This can be accomplished with add_link_options(-rdynamic) in cmake.

Template Parameters:

Upstream – Type of the upstream resource used for allocation/deallocation.

Public Types

using read_lock_t = std::shared_lock<std::shared_mutex>

Type of lock used to synchronize read access.

using write_lock_t = std::unique_lock<std::shared_mutex>

Type of lock used to synchronize write access.

Public Functions

inline tracking_resource_adaptor(device_async_resource_ref upstream, bool capture_stacks = false)

Construct a new tracking resource adaptor using upstream to satisfy allocation requests.

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • capture_stacks – If true, capture stacks for allocation calls

inline tracking_resource_adaptor(Upstream *upstream, bool capture_stacks = false)

Construct a new tracking resource adaptor using upstream to satisfy allocation requests.

Throws:

rmm::logic_error – if upstream == nullptr

Parameters:
  • upstream – The resource used for allocating/deallocating device memory

  • capture_stacks – If true, capture stacks for allocation calls

tracking_resource_adaptor(tracking_resource_adaptor&&) noexcept = default

Default move constructor.

tracking_resource_adaptor &operator=(tracking_resource_adaptor&&) noexcept = default

Default move assignment operator.

Returns:

tracking_resource_adaptor& Reference to the assigned object

inline rmm::device_async_resource_ref get_upstream_resource() const noexcept

rmm::device_async_resource_ref to the upstream resource

Returns:

rmm::device_async_resource_ref to the upstream resource

inline std::map<void*, allocation_info> const &get_outstanding_allocations() const noexcept

Get the outstanding allocations map.

Returns:

std::map<void*, allocation_info> const& of a map of allocations. The key is the allocated memory pointer and the data is the allocation_info structure, which contains size and, potentially, stack traces.

inline std::size_t get_allocated_bytes() const noexcept

Query the number of bytes that have been allocated. Note that this can not be used to know how large of an allocation is possible due to both possible fragmentation and also internal page sizes and alignment that is not tracked by this allocator.

Returns:

std::size_t number of bytes that have been allocated through this allocator.

inline std::string get_outstanding_allocations_str() const

Gets a string containing the outstanding allocation pointers, their size, and optionally the stack trace for when each pointer was allocated.

Stack traces are only included if this resource adaptor was created with capture_stack == true. Otherwise, outstanding allocation pointers will be shown with their size and empty stack traces.

Returns:

std::string Containing the outstanding allocation pointers.

inline void log_outstanding_allocations() const

Log any outstanding allocations via RMM_LOG_DEBUG.

struct allocation_info
#include <tracking_resource_adaptor.hpp>

Information stored about an allocation. Includes the size and a stack trace if the tracking_resource_adaptor was initialized to capture stacks.

Public Functions

inline allocation_info(std::size_t size, bool capture_stack)

Construct a new allocation info object.

Parameters:
  • size – Size of the allocation

  • capture_stack – If true, capture the stack trace for the allocation

Public Members

std::unique_ptr<rmm::detail::stack_trace> strace

Stack trace of the allocation.

std::size_t allocation_size

Size of the allocation.