Public Member Functions | Friends | List of all members
rmm::mr::arena_memory_resource Class Reference

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support. More...

#include <arena_memory_resource.hpp>

Inheritance diagram for rmm::mr::arena_memory_resource:
Inheritance graph
[legend]
Collaboration diagram for rmm::mr::arena_memory_resource:
Collaboration graph
[legend]

Public Member Functions

 arena_memory_resource (cuda::mr::any_resource< cuda::mr::device_accessible > upstream, std::optional< std::size_t > arena_size=std::nullopt, bool dump_log_on_failure=false)
 Construct an arena_memory_resource. More...
 

Friends

void get_property (arena_memory_resource const &, cuda::mr::device_accessible) noexcept
 Enables the cuda::mr::device_accessible property.
 

Detailed Description

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.

Allocation and deallocation are thread-safe. Also, this class is compatible with CUDA per-thread default stream.

GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.

Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.

In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.

This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.

This class is copyable and shares ownership of its internal state via cuda::mr::shared_resource.

See also
Wilson, P. R., Johnstone, M. S., Neely, M., & Boles, D. (1995, September). Dynamic storage allocation: A survey and critical review. In International Workshop on Memory Management (pp. 1-116). Springer, Berlin, Heidelberg.
Berger, E. D., McKinley, K. S., Blumofe, R. D., & Wilson, P. R. (2000). Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11), 117-128.
Evans, J. (2006, April). A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.
https://sourceware.org/glibc/wiki/MallocInternals
http://hoard.org/
http://jemalloc.net/
https://github.com/google/tcmalloc

Constructor & Destructor Documentation

◆ arena_memory_resource()

rmm::mr::arena_memory_resource::arena_memory_resource ( cuda::mr::any_resource< cuda::mr::device_accessible >  upstream,
std::optional< std::size_t >  arena_size = std::nullopt,
bool  dump_log_on_failure = false 
)
explicit

Construct an arena_memory_resource.

Parameters
upstreamThe resource from which to allocate blocks for the global arena.
arena_sizeSize in bytes of the global arena. Defaults to half of the available memory on the current device.
dump_log_on_failureIf true, dump memory log when running out of memory.

The documentation for this class was generated from the following file: