# User Guide Achieving optimal performance in GPU-centric workflows frequently requires customizing how GPU ("device") memory is allocated. RMM is a package that enables you to allocate device memory in a highly configurable way. For example, it enables you to allocate and use pools of GPU memory, or to use [managed memory](https://developer.nvidia.com/blog/unified-memory-cuda-beginners/) for allocations. You can also easily configure other libraries like Numba and CuPy to use RMM for allocating device memory. ## Installation See the project [README](https://github.com/rapidsai/rmm) for how to install RMM. ## Using RMM There are two ways to use RMM in Python code: 1. Using the `rmm.DeviceBuffer` API to explicitly create and manage device memory allocations 2. Transparently via external libraries such as CuPy and Numba RMM provides a `MemoryResource` abstraction to control _how_ device memory is allocated in both the above uses. ### DeviceBuffers A DeviceBuffer represents an **untyped, uninitialized device memory allocation**. DeviceBuffers can be created by providing the size of the allocation in bytes: ```python >>> import rmm >>> buf = rmm.DeviceBuffer(size=100) ``` The size of the allocation and the memory address associated with it can be accessed via the `.size` and `.ptr` attributes respectively: ```python >>> buf.size 100 >>> buf.ptr 140202544726016 ``` DeviceBuffers can also be created by copying data from host memory: ```python >>> import rmm >>> import numpy as np >>> a = np.array([1, 2, 3], dtype='float64') >>> buf = rmm.DeviceBuffer.to_device(a.view("int8")) # to_device expects an 8-bit type or `bytes` >>> buf.size 24 ``` Conversely, the data underlying a DeviceBuffer can be copied to the host: ```python >>> np.frombuffer(buf.tobytes()) array([1., 2., 3.]) ``` ### MemoryResource objects `MemoryResource` objects are used to configure how device memory allocations are made by RMM. By default if a `MemoryResource` is not set explicitly, RMM uses the `CudaMemoryResource`, which uses `cudaMalloc` for allocating device memory. `rmm.reinitialize()` provides an easy way to initialize RMM with specific memory resource options across multiple devices. See `help(rmm.reinitialize)` for full details. For lower-level control, the `rmm.mr.set_current_device_resource()` function can be used to set a different MemoryResource for the current CUDA device. For example, enabling the `ManagedMemoryResource` tells RMM to use `cudaMallocManaged` instead of `cudaMalloc` for allocating memory: ```python >>> import rmm >>> rmm.mr.set_current_device_resource(rmm.mr.ManagedMemoryResource()) ``` > :warning: The default resource must be set for any device **before** > allocating any device memory on that device. Setting or changing the > resource after device allocations have been made can lead to unexpected > behaviour or crashes. As another example, `PoolMemoryResource` allows you to allocate a large "pool" of device memory up-front. Subsequent allocations will draw from this pool of already allocated memory. The example below shows how to construct a PoolMemoryResource with an initial size of 1 GiB and a maximum size of 4 GiB. The pool uses `CudaMemoryResource` as its underlying ("upstream") memory resource: ```python >>> import rmm >>> pool = rmm.mr.PoolMemoryResource( ... rmm.mr.CudaMemoryResource(), ... initial_pool_size=2**30, ... maximum_pool_size=2**32 ... ) >>> rmm.mr.set_current_device_resource(pool) ``` Similarly, to use a pool of managed memory: ```python >>> import rmm >>> pool = rmm.mr.PoolMemoryResource( ... rmm.mr.ManagedMemoryResource(), ... initial_pool_size=2**30, ... maximum_pool_size=2**32 ... ) >>> rmm.mr.set_current_device_resource(pool) ``` Other MemoryResources include: * `FixedSizeMemoryResource` for allocating fixed blocks of memory * `BinningMemoryResource` for allocating blocks within specified "bin" sizes from different memory resources MemoryResources are highly configurable and can be composed together in different ways. See `help(rmm.mr)` for more information. ## Using RMM with third-party libraries A number of libraries provide hooks to control their device allocations. RMM provides implementations of these for [CuPy](https://cupy.dev), [numba](https://numba.readthedocs.io/en/stable/), and [PyTorch](https://pytorch.org) in the `rmm.allocators` submodule. All these approaches configure the library to use the _current_ RMM memory resource for device allocations. ### Using RMM with CuPy You can configure [CuPy](https://cupy.dev/) to use RMM for memory allocations by setting the CuPy CUDA allocator to `rmm.allocators.cupy.rmm_cupy_allocator`: ```python >>> from rmm.allocators.cupy import rmm_cupy_allocator >>> import cupy >>> cupy.cuda.set_allocator(rmm_cupy_allocator) ``` ### Using RMM with Numba You can configure [Numba](https://numba.readthedocs.io/en/stable/) to use RMM for memory allocations using the Numba [EMM Plugin](https://numba.readthedocs.io/en/stable/cuda/external-memory.html#setting-emm-plugin). This can be done in two ways: 1. Setting the environment variable `NUMBA_CUDA_MEMORY_MANAGER`: ```bash $ NUMBA_CUDA_MEMORY_MANAGER=rmm.allocators.numba python (args) ``` 2. Using the `set_memory_manager()` function provided by Numba: ```python >>> from numba import cuda >>> from rmm.allocators.numba import RMMNumbaManager >>> cuda.set_memory_manager(RMMNumbaManager) ``` ### Using RMM with PyTorch You can configure [PyTorch](https://pytorch.org/docs/stable/notes/cuda.html) to use RMM for memory allocations using their by configuring the current allocator. ```python from rmm.allocators.torch import rmm_torch_allocator import torch torch.cuda.memory.change_current_allocator(rmm_torch_allocator) ```