A suballocator that emphasizes fragmentation avoidance and scalable concurrency support. More...

#include <arena_memory_resource.hpp>

Inheritance diagram for rmm::mr::arena_memory_resource:

Collaboration diagram for rmm::mr::arena_memory_resource:

Public Member Functions
	arena_memory_resource (cuda::mr::any_resource< cuda::mr::device_accessible > upstream, std::optional< std::size_t > arena_size=std::nullopt, bool dump_log_on_failure=false)
	Construct an `arena_memory_resource`. More...

Friends
void	get_property (arena_memory_resource const &, cuda::mr::device_accessible) noexcept
	Enables the `cuda::mr::device_accessible` property.

Detailed Description

A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.

Allocation and deallocation are thread-safe. Also, this class is compatible with CUDA per-thread default stream.

GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.

Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.

In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.

This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.

This class is copyable and shares ownership of its internal state via cuda::mr::shared_resource.

See also: Wilson, P. R., Johnstone, M. S., Neely, M., & Boles, D. (1995, September). Dynamic storage allocation: A survey and critical review. In International Workshop on Memory Management (pp. 1-116). Springer, Berlin, Heidelberg.; Berger, E. D., McKinley, K. S., Blumofe, R. D., & Wilson, P. R. (2000). Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11), 117-128.; Evans, J. (2006, April). A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.; https://sourceware.org/glibc/wiki/MallocInternals; http://hoard.org/; http://jemalloc.net/; https://github.com/google/tcmalloc

Constructor & Destructor Documentation

◆ arena_memory_resource()

rmm::mr::arena_memory_resource::arena_memory_resource	(	cuda::mr::any_resource< cuda::mr::device_accessible >	upstream,
		std::optional< std::size_t >	arena_size = `std::nullopt`,
		bool	dump_log_on_failure = `false`
	)

explicit

Construct an arena_memory_resource.

Parameters

upstream	The resource from which to allocate blocks for the global arena.
arena_size	Size in bytes of the global arena. Defaults to half of the available memory on the current device.
dump_log_on_failure	If true, dump memory log when running out of memory.

The documentation for this class was generated from the following file:

arena_memory_resource.hpp

Public Member Functions

Friends

Detailed Description

Constructor & Destructor Documentation

◆ arena_memory_resource()