Public Member Functions | List of all members
rmm::mr::arena_memory_resource< Upstream > Class Template Referencefinal

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support. More...

#include <arena_memory_resource.hpp>

Inheritance diagram for rmm::mr::arena_memory_resource< Upstream >:
Inheritance graph
[legend]
Collaboration diagram for rmm::mr::arena_memory_resource< Upstream >:
Collaboration graph
[legend]

Public Member Functions

 arena_memory_resource (Upstream *upstream_mr, std::optional< std::size_t > arena_size=std::nullopt, bool dump_log_on_failure=false)
 Construct an arena_memory_resource. More...
 
 arena_memory_resource (arena_memory_resource const &)=delete
 
arena_memory_resourceoperator= (arena_memory_resource const &)=delete
 
 arena_memory_resource (arena_memory_resource &&) noexcept=delete
 
arena_memory_resourceoperator= (arena_memory_resource &&) noexcept=delete
 
bool supports_streams () const noexcept override
 Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation. More...
 
bool supports_get_mem_info () const noexcept override
 Query whether the resource supports the get_mem_info API. More...
 
- Public Member Functions inherited from rmm::mr::device_memory_resource
 device_memory_resource (device_memory_resource const &)=default
 Default copy constructor.
 
 device_memory_resource (device_memory_resource &&) noexcept=default
 Default move constructor.
 
device_memory_resourceoperator= (device_memory_resource const &)=default
 Default copy assignment operator. More...
 
device_memory_resourceoperator= (device_memory_resource &&) noexcept=default
 Default move assignment operator. More...
 
void * allocate (std::size_t bytes, cuda_stream_view stream=cuda_stream_view{})
 Allocates memory of size at least bytes. More...
 
void deallocate (void *ptr, std::size_t bytes, cuda_stream_view stream=cuda_stream_view{})
 Deallocate memory pointed to by p. More...
 
bool is_equal (device_memory_resource const &other) const noexcept
 Compare this resource to another. More...
 
void * allocate (std::size_t bytes, std::size_t alignment)
 Allocates memory of size at least bytes. More...
 
void deallocate (void *ptr, std::size_t bytes, std::size_t alignment)
 Deallocate memory pointed to by p. More...
 
void * allocate_async (std::size_t bytes, std::size_t alignment, cuda_stream_view stream)
 Allocates memory of size at least bytes. More...
 
void * allocate_async (std::size_t bytes, cuda_stream_view stream)
 Allocates memory of size at least bytes. More...
 
void deallocate_async (void *ptr, std::size_t bytes, std::size_t alignment, cuda_stream_view stream)
 Deallocate memory pointed to by p. More...
 
void deallocate_async (void *ptr, std::size_t bytes, cuda_stream_view stream)
 Deallocate memory pointed to by p. More...
 
bool operator== (device_memory_resource const &other) const noexcept
 Comparison operator with another device_memory_resource. More...
 
bool operator!= (device_memory_resource const &other) const noexcept
 Comparison operator with another device_memory_resource. More...
 
std::pair< std::size_t, std::size_t > get_mem_info (cuda_stream_view stream) const
 Queries the amount of free and total memory for the resource. More...
 

Detailed Description

template<typename Upstream>
class rmm::mr::arena_memory_resource< Upstream >

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.

Allocation (do_allocate()) and deallocation (do_deallocate()) are thread-safe. Also, this class is compatible with CUDA per-thread default stream.

GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.

Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.

In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.

This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.

See also
Wilson, P. R., Johnstone, M. S., Neely, M., & Boles, D. (1995, September). Dynamic storage allocation: A survey and critical review. In International Workshop on Memory Management (pp. 1-116). Springer, Berlin, Heidelberg.
Berger, E. D., McKinley, K. S., Blumofe, R. D., & Wilson, P. R. (2000). Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11), 117-128.
Evans, J. (2006, April). A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.
https://sourceware.org/glibc/wiki/MallocInternals
http://hoard.org/
http://jemalloc.net/
https://github.com/google/tcmalloc
Template Parameters
UpstreamMemory resource to use for allocating memory for the global arena. Implements rmm::mr::device_memory_resource interface.

Constructor & Destructor Documentation

◆ arena_memory_resource()

template<typename Upstream >
rmm::mr::arena_memory_resource< Upstream >::arena_memory_resource ( Upstream *  upstream_mr,
std::optional< std::size_t >  arena_size = std::nullopt,
bool  dump_log_on_failure = false 
)
inlineexplicit

Construct an arena_memory_resource.

Exceptions
rmm::logic_errorif upstream_mr == nullptr.
Parameters
upstream_mrThe memory resource from which to allocate blocks for the global arena.
arena_sizeSize in bytes of the global arena. Defaults to half of the available memory on the current device.
dump_log_on_failureIf true, dump memory log when running out of memory.

Member Function Documentation

◆ supports_get_mem_info()

template<typename Upstream >
bool rmm::mr::arena_memory_resource< Upstream >::supports_get_mem_info ( ) const
inlineoverridevirtualnoexcept

Query whether the resource supports the get_mem_info API.

Returns
bool false.

Implements rmm::mr::device_memory_resource.

◆ supports_streams()

template<typename Upstream >
bool rmm::mr::arena_memory_resource< Upstream >::supports_streams ( ) const
inlineoverridevirtualnoexcept

Queries whether the resource supports use of non-null CUDA streams for allocation/deallocation.

Returns
bool true.

Implements rmm::mr::device_memory_resource.


The documentation for this class was generated from the following file: