Data Containers

group data_containers
class device_buffer
#include <device_buffer.hpp>

RAII construct for device memory allocation.

This class allocates untyped and uninitialized device memory using a device_async_resource_ref. If not explicitly specified, the memory resource returned from get_current_device_resource_ref() is used.

Examples:

//Allocates at least 100 bytes of device memory using the default memory
//resource and default stream.
device_buffer buff(100);

// allocates at least 100 bytes using the custom memory resource and
// specified stream
custom_memory_resource mr;
cuda_stream_view stream = cuda_stream_view{};
device_buffer custom_buff(100, stream, &mr);

// deep copies `buff` into a new device buffer using the specified stream
device_buffer buff_copy(buff, stream);

// moves the memory in `from_buff` to `to_buff`. Deallocates previously allocated
// to_buff memory on `to_buff.stream()`.
device_buffer to_buff(std::move(from_buff));

// deep copies `buff` into a new device buffer using the specified stream
device_buffer buff_copy(buff, stream);

// shallow copies `buff` into a new device_buffer, `buff` is now empty
device_buffer buff_move(std::move(buff));

// Default construction. Buffer is empty
device_buffer buff_default{};

// If the requested size is larger than the current size, resizes allocation to the new size and
// deep copies any previous contents. Otherwise, simply updates the value of `size()` to the
// newly requested size without any allocations or copies. Uses the specified stream.
buff_default.resize(100, stream);

Note

Unlike std::vector or thrust::device_vector, the device memory allocated by a device_buffer is uninitialized. Therefore, it is undefined behavior to read the contents of data() before first initializing it.

Public Functions

inline device_buffer()

Default constructor creates an empty device_buffer

inline explicit device_buffer(std::size_t size, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Constructs a new device buffer of size uninitialized bytes.

Throws:

rmm::bad_alloc – If allocation fails.

Parameters:
  • size – Size in bytes to allocate in device memory.

  • stream – CUDA stream on which memory may be allocated if the memory resource supports streams.

  • mr – Memory resource to use for the device memory allocation.

inline device_buffer(void const *source_data, std::size_t size, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new device buffer by copying from a raw pointer to an existing host or device memory allocation.

Note

This function does not synchronize stream. source_data is copied on stream, so the caller is responsible for correct synchronization to ensure that source_data is valid when the copy occurs. This includes destroying source_data in stream order after this function is called, or synchronizing or waiting on stream after this function returns as necessary.

Throws:
Parameters:
  • source_data – Pointer to the host or device memory to copy from.

  • size – Size in bytes to copy.

  • stream – CUDA stream on which memory may be allocated if the memory resource supports streams.

  • mr – Memory resource to use for the device memory allocation

inline device_buffer(device_buffer const &other, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new device_buffer by deep copying the contents of another device_buffer, optionally using the specified stream and memory resource.

Note

Only copies other.size() bytes from other, i.e., if other.size() != other.capacity(), then the size and capacity of the newly constructed device_buffer will be equal to other.size().

Note

This function does not synchronize stream. other is copied on stream, so the caller is responsible for correct synchronization to ensure that other is valid when the copy occurs. This includes destroying other in stream order after this function is called, or synchronizing or waiting on stream after this function returns as necessary.

Throws:
Parameters:
  • other – The device_buffer whose contents will be copied

  • stream – The stream to use for the allocation and copy

  • mr – The resource to use for allocating the new device_buffer

inline device_buffer(device_buffer &&other) noexcept

Constructs a new device_buffer by moving the contents of another device_buffer into the newly constructed one.

After the new device_buffer is constructed, other is modified to be a valid, empty device_buffer, i.e., data() returns nullptr, and size() and capacity() are zero.

Parameters:

other – The device_buffer whose contents will be moved into the newly constructed one.

inline device_buffer &operator=(device_buffer &&other) noexcept

Move assignment operator moves the contents from other.

This device_buffer’s current device memory allocation will be deallocated on stream().

If a different stream is required, call set_stream() on the instance before assignment. After assignment, this instance’s stream is replaced by the other.stream().

Parameters:

other – The device_buffer whose contents will be moved.

Returns:

A reference to this device_buffer

inline ~device_buffer() noexcept

Destroy the device buffer object.

Note

If the memory resource supports streams, this destructor deallocates using the stream most recently passed to any of this device buffer’s methods.

inline void reserve(std::size_t new_capacity, cuda_stream_view stream)

Increase the capacity of the device memory allocation.

If the requested new_capacity is less than or equal to capacity(), no action is taken.

If new_capacity is larger than capacity(), a new allocation is made on stream to satisfy new_capacity, and the contents of the old allocation are copied on stream to the new allocation. The old allocation is then freed. The bytes from [size(), new_capacity) are uninitialized.

Throws:
  • rmm::bad_alloc – If creating the new allocation fails

  • rmm::cuda_error – if the copy from the old to new allocation fails

Parameters:
  • new_capacity – The requested new capacity, in bytes

  • stream – The stream to use for allocation and copy

inline void resize(std::size_t new_size, cuda_stream_view stream)

Resize the device memory allocation.

If the requested new_size is less than or equal to capacity(), no action is taken other than updating the value that is returned from size(). Specifically, no memory is allocated nor copied. The value capacity() remains the actual size of the device memory allocation.

If new_size is larger than capacity(), a new allocation is made on stream to satisfy new_size, and the contents of the old allocation are copied on stream to the new allocation. The old allocation is then freed. The bytes from [old_size, new_size) are uninitialized.

The invariant size() <= capacity() holds.

Note

shrink_to_fit() may be used to force the deallocation of unused capacity().

Throws:
  • rmm::bad_alloc – If creating the new allocation fails

  • rmm::cuda_error – if the copy from the old to new allocation fails

Parameters:
  • new_size – The requested new size, in bytes

  • stream – The stream to use for allocation and copy

inline void shrink_to_fit(cuda_stream_view stream)

Forces the deallocation of unused memory.

Reallocates and copies on stream stream the contents of the device memory allocation to reduce capacity() to size().

If size() == capacity(), no allocations or copies occur.

Throws:
  • rmm::bad_alloc – If creating the new allocation fails

  • rmm::cuda_error – If the copy from the old to new allocation fails

Parameters:

stream – The stream on which the allocation and copy are performed

inline void const *data() const noexcept

Const pointer to the device memory allocation.

Returns:

Const pointer to the device memory allocation

inline void *data() noexcept

Pointer to the device memory allocation.

Returns:

Pointer to the device memory allocation

inline std::size_t size() const noexcept

The number of bytes.

Returns:

The number of bytes

inline std::int64_t ssize() const noexcept

The signed number of bytes.

Returns:

The signed number of bytes

inline bool is_empty() const noexcept

Whether or not the buffer currently holds any data.

If is_empty() == true, the device_buffer may still hold an allocation if capacity() > 0.

Returns:

Whether or not the buffer currently holds any data

inline std::size_t capacity() const noexcept

Returns actual size in bytes of device memory allocation.

The invariant size() <= capacity() holds.

Returns:

The actual size in bytes of the device memory allocation

inline cuda_stream_view stream() const noexcept

The stream most recently specified for allocation/deallocation.

Returns:

The stream most recently specified for allocation/deallocation

inline void set_stream(cuda_stream_view stream) noexcept

Sets the stream to be used for deallocation.

If no other rmm::device_buffer method that allocates memory is called after this call with a different stream argument, then stream will be used for deallocation in the rmm::device_uvector destructor. However, if either of resize() or shrink_to_fit() is called after this, the later stream parameter will be stored and used in the destructor.

Parameters:

stream – The stream to use for deallocation

inline rmm::device_async_resource_ref memory_resource() const noexcept

The resource used to allocate and deallocate.

Returns:

The resource used to allocate and deallocate

template<typename T>
class device_scalar
#include <device_scalar.hpp>

Container for a single object of type T in device memory.

T must be trivially copyable.

Template Parameters:

T – The object’s type

Public Types

using value_type = typename device_uvector<T>::value_type

T, the type of the scalar element.

using size_type = typename device_uvector<T>::size_type

The type used for the size.

using reference = typename device_uvector<T>::reference

value_type&

using const_reference = typename device_uvector<T>::const_reference

const value_type&

using pointer = typename device_uvector<T>::pointer

The type of the pointer returned by data()

using const_pointer = typename device_uvector<T>::const_pointer

The type of the iterator returned by data() const

Public Functions

RMM_EXEC_CHECK_DISABLE device_scalar(device_scalar&&) noexcept = default

Default move constructor.

device_scalar &operator=(device_scalar&&) noexcept = default

Default move assignment operator.

Returns:

device_scalar& A reference to the assigned-to object

device_scalar(device_scalar const&) = delete

Copy ctor is deleted as it doesn’t allow a stream argument.

device_scalar &operator=(device_scalar const&) = delete

Copy assignment is deleted as it doesn’t allow a stream argument.

device_scalar() = delete

Default constructor is deleted as it doesn’t allow a stream argument.

inline explicit device_scalar(cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new uninitialized device_scalar.

Does not synchronize the stream.

Note

This device_scalar is only safe to access in kernels and copies on the specified CUDA stream, or on another stream only if a dependency is enforced (e.g. using cudaStreamWaitEvent()).

Throws:

rmm::bad_alloc – if allocating the device memory fails.

Parameters:
  • stream – Stream on which to perform asynchronous allocation.

  • mr – Optional, resource with which to allocate.

inline explicit device_scalar(value_type const &initial_value, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new device_scalar with an initial value.

Does not synchronize the stream.

Note

This device_scalar is only safe to access in kernels and copies on the specified CUDA stream, or on another stream only if a dependency is enforced (e.g. using cudaStreamWaitEvent()).

Throws:
  • rmm::bad_alloc – if allocating the device memory for initial_value fails.

  • rmm::cuda_error – if copying initial_value to device memory fails.

Parameters:
  • initial_value – The initial value of the object in device memory.

  • stream – Optional, stream on which to perform allocation and copy.

  • mr – Optional, resource with which to allocate.

inline device_scalar(device_scalar const &other, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new device_scalar by deep copying the contents of another device_scalar, using the specified stream and memory resource.

Throws:
Parameters:
  • other – The device_scalar whose contents will be copied

  • stream – The stream to use for the allocation and copy

  • mr – The resource to use for allocating the new device_scalar

inline value_type value(cuda_stream_view stream) const

Copies the value from device to host, synchronizes, and returns the value.

Synchronizes stream after copying the data from device to host.

Note

If the stream specified to this function is different from the stream specified to the constructor, then an appropriate dependency must be inserted between the streams (e.g. using cudaStreamWaitEvent() or cudaStreamSynchronize()) before calling this function, otherwise there may be a race condition.

Throws:
Parameters:

stream – CUDA stream on which to perform the copy and synchronize.

Returns:

T The value of the scalar.

inline void set_value_async(value_type const &value, cuda_stream_view stream)

Sets the value of the device_scalar to the value of v.

This specialization for fundamental types is optimized to use cudaMemsetAsync when v is zero.

This function does not synchronize stream before returning. Therefore, the object referenced by v should not be destroyed or modified until stream has been synchronized. Otherwise, behavior is undefined.

Example:

rmm::device_scalar<int32_t> s;

int v{42};

// Copies 42 to device storage on `stream`. Does _not_ synchronize
vec.set_value_async(v, stream);
...
cudaStreamSynchronize(stream);
// Synchronization is required before `v` can be modified
v = 13;

Note

If the stream specified to this function is different from the stream specified to the constructor, then appropriate dependencies must be inserted between the streams (e.g. using cudaStreamWaitEvent() or cudaStreamSynchronize()) before and after calling this function, otherwise there may be a race condition.

Note

: This function incurs a host to device memcpy or device memset and should be used carefully.

Throws:

rmm::cuda_error – if copying value to device memory fails.

Parameters:
  • value – The host value which will be copied to device

  • stream – CUDA stream on which to perform the copy

inline void set_value_to_zero_async(cuda_stream_view stream)

Sets the value of the device_scalar to zero on the specified stream.

This function does not synchronize stream before returning.

Note

If the stream specified to this function is different from the stream specified to the constructor, then appropriate dependencies must be inserted between the streams (e.g. using cudaStreamWaitEvent() or cudaStreamSynchronize()) before and after calling this function, otherwise there may be a race condition.

Note

: This function incurs a device memset and should be used carefully.

Parameters:

stream – CUDA stream on which to perform the copy

inline pointer data() noexcept

Returns pointer to object in device memory.

Note

If the returned device pointer is used on a CUDA stream different from the stream specified to the constructor, then appropriate dependencies must be inserted between the streams (e.g. using cudaStreamWaitEvent() or cudaStreamSynchronize()), otherwise there may be a race condition.

Returns:

Pointer to underlying device memory

inline const_pointer data() const noexcept

Returns const pointer to object in device memory.

Note

If the returned device pointer is used on a CUDA stream different from the stream specified to the constructor, then appropriate dependencies must be inserted between the streams (e.g. using cudaStreamWaitEvent() or cudaStreamSynchronize()), otherwise there may be a race condition.

Returns:

Const pointer to underlying device memory

inline constexpr size_type size() const noexcept

The size of the scalar: always 1.

Returns:

The size of the scalar: always 1

inline cuda_stream_view stream() const noexcept

Stream associated with the device memory allocation.

Returns:

Stream associated with the device memory allocation

inline void set_stream(cuda_stream_view stream) noexcept

Sets the stream to be used for deallocation.

Parameters:

stream – Stream to be used for deallocation

template<typename T>
class device_uvector
#include <device_uvector.hpp>

An uninitialized vector of elements in device memory.

Similar to a thrust::device_vector, device_uvector is a random access container of elements stored contiguously in device memory. However, unlike thrust::device_vector, device_uvector does not default initialize the vector elements.

If initialization is desired, this must be done explicitly by the caller, e.g., with thrust::uninitialized_fill.

Example:

auto mr = new my_custom_resource();
rmm::cuda_stream_view s{};

// Allocates *uninitialized* device memory on stream `s` sufficient for 100 ints using the
// supplied resource `mr`
rmm::device_uvector<int> uv(100, s, mr);

// Initializes all elements to 0 on stream `s`
thrust::uninitialized_fill(thrust::cuda::par.on(s), uv.begin(), uv.end(), 0);

Avoiding default initialization improves performance by eliminating the kernel launch required to default initialize the elements. This initialization is often unnecessary, e.g., when the vector is created to hold some output from some operation.

However, this restricts the element type T to only trivially copyable types. In short, trivially copyable types can be safely copied with memcpy. For more information, see https://en.cppreference.com/w/cpp/types/is_trivially_copyable.

Another key difference over thrust::device_vector is that all operations that invoke allocation, kernels, or memcpys take a CUDA stream parameter to indicate on which stream the operation will be performed.

Template Parameters:

T – Trivially copyable element type

Public Types

using value_type = T

T; stored value type.

using size_type = std::size_t

The type used for the size of the vector.

using reference = value_type&

value_type&; reference type returned by operator[](size_type)

using const_reference = value_type const&

value_type const&; constant reference type returned by operator[](size_type) const

using pointer = value_type*

The type of the pointer returned by data()

using const_pointer = value_type const*

The type of the pointer returned by data() const.

using iterator = pointer

The type of the iterator returned by begin()

using const_iterator = const_pointer

The type of the const iterator returned by cbegin()

Public Functions

RMM_EXEC_CHECK_DISABLE device_uvector(device_uvector&&) noexcept = default

Default move constructor.

device_uvector &operator=(device_uvector&&) noexcept = default

Default move assignment operator.

Returns:

device_uvector& Reference to the assigned object

device_uvector(device_uvector const&) = delete

Copy ctor is deleted as it doesn’t allow a stream argument.

device_uvector &operator=(device_uvector const&) = delete

Copy assignment is deleted as it doesn’t allow a stream argument.

device_uvector() = delete

Default constructor is deleted as it doesn’t allow a stream argument.

inline explicit device_uvector(std::size_t size, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new device_uvector with sufficient uninitialized storage for size elements.

Elements are uninitialized. Reading an element before it is initialized results in undefined behavior.

Parameters:
  • size – The number of elements to allocate storage for

  • stream – The stream on which to perform the allocation

  • mr – The resource used to allocate the device storage

inline explicit device_uvector(device_uvector const &other, cuda_stream_view stream, device_async_resource_ref mr = mr::get_current_device_resource_ref())

Construct a new device_uvector by deep copying the contents of another device_uvector.

Elements are copied as if by memcpy, i.e., T’s copy constructor is not invoked.

Parameters:
  • other – The vector to copy from

  • stream – The stream on which to perform the copy

  • mr – The resource used to allocate device memory for the new vector

inline pointer element_ptr(std::size_t element_index) noexcept

Returns pointer to the specified element.

Behavior is undefined if element_index >= size().

Parameters:

element_index – Index of the specified element.

Returns:

T* Pointer to the desired element

inline const_pointer element_ptr(std::size_t element_index) const noexcept

Returns pointer to the specified element.

Behavior is undefined if element_index >= size().

Parameters:

element_index – Index of the specified element.

Returns:

T* Pointer to the desired element

inline void set_element_async(std::size_t element_index, value_type const &value, cuda_stream_view stream)

Performs an asynchronous copy of v to the specified element in device memory.

This specialization for fundamental types is optimized to use cudaMemsetAsync when host_value is zero.

This function does not synchronize stream s before returning. Therefore, the object referenced by v should not be destroyed or modified until stream has been synchronized. Otherwise, behavior is undefined.

Example:

rmm::device_uvector<int32_t> vec(100, stream);

int v{42};

// Copies 42 to element 0 on `stream`. Does _not_ synchronize
vec.set_element_async(0, v, stream);
...
cudaStreamSynchronize(stream);
// Synchronization is required before `v` can be modified
v = 13;

Note

This function incurs a host to device memcpy and should be used sparingly.

Note

Calling this function with a literal or other r-value reference for v is disallowed to prevent the implementation from asynchronously copying from a literal or other implicit temporary after it is deleted or goes out of scope.

Throws:

rmm::out_of_range – exception if element_index >= size()

Parameters:
  • element_index – Index of the target element

  • value – The value to copy to the specified element

  • stream – The stream on which to perform the copy

inline void set_element_to_zero_async(std::size_t element_index, cuda_stream_view stream)

Asynchronously sets the specified element to zero in device memory.

This function does not synchronize stream s before returning

Example:

rmm::device_uvector<int32_t> vec(100, stream);

int v{42};

// Sets element at index 42 to 0 on `stream`. Does _not_ synchronize
vec.set_element_to_zero_async(42, stream);

Note

This function incurs a device memset and should be used sparingly.

Throws:

rmm::out_of_range – exception if element_index >= size()

Parameters:
  • element_index – Index of the target element

  • stream – The stream on which to perform the copy

inline void set_element(std::size_t element_index, T const &value, cuda_stream_view stream)

Performs a synchronous copy of v to the specified element in device memory.

Because this function synchronizes the stream s, it is safe to destroy or modify the object referenced by v after this function has returned.

Example:

rmm::device_uvector<int32_t> vec(100, stream);

int v{42};

// Copies 42 to element 0 on `stream` and synchronizes the stream
vec.set_element(0, v, stream);

// It is safe to destroy or modify `v`
v = 13;

Note

This function incurs a host to device memcpy and should be used sparingly.

Note

This function synchronizes stream.

Throws:

rmm::out_of_range – exception if element_index >= size()

Parameters:
  • element_index – Index of the target element

  • value – The value to copy to the specified element

  • stream – The stream on which to perform the copy

inline value_type element(std::size_t element_index, cuda_stream_view stream) const

Returns the specified element from device memory.

Note

This function incurs a device to host memcpy and should be used sparingly.

Note

This function synchronizes stream.

Throws:

rmm::out_of_range – exception if element_index >= size()

Parameters:
  • element_index – Index of the desired element

  • stream – The stream on which to perform the copy

Returns:

The value of the specified element

inline value_type front_element(cuda_stream_view stream) const

Returns the first element.

Note

This function incurs a device-to-host memcpy and should be used sparingly.

Note

This function synchronizes stream.

Throws:

rmm::out_of_range – exception if the vector is empty.

Parameters:

stream – The stream on which to perform the copy

Returns:

The value of the first element

inline value_type back_element(cuda_stream_view stream) const

Returns the last element.

Note

This function incurs a device-to-host memcpy and should be used sparingly.

Note

This function synchronizes stream.

Throws:

rmm::out_of_range – exception if the vector is empty.

Parameters:

stream – The stream on which to perform the copy

Returns:

The value of the last element

inline void reserve(std::size_t new_capacity, cuda_stream_view stream)

Increases the capacity of the vector to new_capacity elements.

If new_capacity <= capacity(), no action is taken.

If new_capacity > capacity(), a new allocation of size new_capacity is created, and the first size() elements from the current allocation are copied there as if by memcpy. Finally, the old allocation is freed and replaced by the new allocation.

Parameters:
  • new_capacity – The desired capacity (number of elements)

  • stream – The stream on which to perform the allocation/copy (if any)

inline void resize(std::size_t new_size, cuda_stream_view stream)

Resizes the vector to contain new_size elements.

If new_size > size(), the additional elements are uninitialized.

If new_size < capacity(), no action is taken other than updating the value of size(). No memory is allocated nor copied. shrink_to_fit() may be used to force deallocation of unused memory.

If new_size > capacity(), elements are copied as if by memcpy to a new allocation.

The invariant size() <= capacity() holds.

Parameters:
  • new_size – The desired number of elements

  • stream – The stream on which to perform the allocation/copy (if any)

inline void shrink_to_fit(cuda_stream_view stream)

Forces deallocation of unused device memory.

If capacity() > size(), reallocates and copies vector contents to eliminate unused memory.

Parameters:

stream – Stream on which to perform allocation and copy

inline device_buffer release() noexcept

Release ownership of device memory storage.

Returns:

The device_buffer used to store the vector elements

inline std::size_t capacity() const noexcept

Returns the number of elements that can be held in currently allocated storage.

Returns:

std::size_t The number of elements that can be stored without requiring a new allocation.

inline pointer data() noexcept

Returns pointer to underlying device storage.

Note

If size() == 0 it is undefined behavior to deference the returned pointer. Furthermore, the returned pointer may or may not be equal to nullptr.

Returns:

Raw pointer to element storage in device memory.

inline const_pointer data() const noexcept

Returns const pointer to underlying device storage.

Note

If size() == 0 it is undefined behavior to deference the returned pointer. Furthermore, the returned pointer may or may not be equal to nullptr.

Returns:

const_pointer Raw const pointer to element storage in device memory.

inline iterator begin() noexcept

Returns an iterator to the first element.

If the vector is empty, then begin() == end().

Returns:

Iterator to the first element.

inline const_iterator cbegin() const noexcept

Returns a const_iterator to the first element.

If the vector is empty, then cbegin() == cend().

Returns:

Immutable iterator to the first element.

inline const_iterator begin() const noexcept

Returns a const_iterator to the first element.

If the vector is empty, then begin() == end().

Returns:

Immutable iterator to the first element.

inline iterator end() noexcept

Returns an iterator to the element following the last element of the vector.

The element referenced by end() is a placeholder and dereferencing it results in undefined behavior.

Returns:

Iterator to one past the last element.

inline const_iterator cend() const noexcept

Returns a const_iterator to the element following the last element of the vector.

The element referenced by end() is a placeholder and dereferencing it results in undefined behavior.

Returns:

Immutable iterator to one past the last element.

inline const_iterator end() const noexcept

Returns an iterator to the element following the last element of the vector.

The element referenced by end() is a placeholder and dereferencing it results in undefined behavior.

Returns:

Immutable iterator to one past the last element.

inline std::size_t size() const noexcept

The number of elements in the vector.

Returns:

The number of elements in the vector

inline std::int64_t ssize() const noexcept

The signed number of elements in the vector.

Returns:

The signed number of elements in the vector

inline bool is_empty() const noexcept

true if the vector contains no elements, i.e. size() == 0

Returns:

true if the vector contains no elements, i.e. size() == 0

inline rmm::device_async_resource_ref memory_resource() const noexcept

The resource used to allocate and deallocate the device storage.

Returns:

The resource used to allocate and deallocate the device storage

inline cuda_stream_view stream() const noexcept

Stream most recently specified for allocation/deallocation.

Returns:

Stream most recently specified for allocation/deallocation

inline void set_stream(cuda_stream_view stream) noexcept

Sets the stream to be used for deallocation.

If no other rmm::device_uvector method that allocates memory is called after this call with a different stream argument, then stream will be used for deallocation in the rmm::device_uvector destructor. However, if either ofresize()orshrink_to_fit()` is called after this, the later stream parameter will be stored and used in the destructor.

Parameters:

stream – The stream to use for deallocation