A suballocator that emphasizes fragmentation avoidance and scalable concurrency support. More...
#include <arena_memory_resource.hpp>


Public Member Functions | |
| arena_memory_resource (cuda::mr::any_resource< cuda::mr::device_accessible > upstream, std::optional< std::size_t > arena_size=std::nullopt, bool dump_log_on_failure=false) | |
Construct an arena_memory_resource. More... | |
Friends | |
| void | get_property (arena_memory_resource const &, cuda::mr::device_accessible) noexcept |
Enables the cuda::mr::device_accessible property. | |
A suballocator that emphasizes fragmentation avoidance and scalable concurrency support.
Allocation and deallocation are thread-safe. Also, this class is compatible with CUDA per-thread default stream.
GPU memory is divided into a global arena, per-thread arenas for default streams, and per-stream arenas for non-default streams. Each arena allocates memory from the global arena in chunks called superblocks.
Blocks in each arena are allocated using address-ordered first fit. When a block is freed, it is coalesced with neighbouring free blocks if the addresses are contiguous. Free superblocks are returned to the global arena.
In real-world applications, allocation sizes tend to follow a power law distribution in which large allocations are rare, but small ones quite common. By handling small allocations in the per-thread arena, adequate performance can be achieved without introducing excessive memory fragmentation under high concurrency.
This design is inspired by several existing CPU memory allocators targeting multi-threaded applications (glibc malloc, Hoard, jemalloc, TCMalloc), albeit in a simpler form. Possible future improvements include using size classes, allocation caches, and more fine-grained locking or lock-free approaches.
This class is copyable and shares ownership of its internal state via cuda::mr::shared_resource.
|
explicit |
Construct an arena_memory_resource.
| upstream | The resource from which to allocate blocks for the global arena. |
| arena_size | Size in bytes of the global arena. Defaults to half of the available memory on the current device. |
| dump_log_on_failure | If true, dump memory log when running out of memory. |