Perform a chunked "pack" operation of the input table_view using a user provided buffer of size user_buffer_size. More...

#include <contiguous_split.hpp>

Public Member Functions
	chunked_pack (cudf::table_view const &input, std::size_t user_buffer_size, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref temp_mr=cudf::get_current_device_resource_ref())
	Construct a `chunked_pack` class. More...

	~chunked_pack ()
	Destructor that will be implemented as default. Declared with definition here because contiguous_split_state is incomplete at this stage.

std::size_t	get_total_contiguous_size () const
	Obtain the total size of the contiguously packed `table_view`. More...

bool	has_next () const
	Function to check if there are chunks left to be copied. More...

std::size_t	next (cudf::device_span< uint8_t > const &user_buffer)
	Packs the next chunk into `user_buffer`. This should be called as long as `has_next` returns true. If `next` is called when `has_next` is false, an exception is thrown. More...

std::unique_ptr< std::vector< uint8_t > >	build_metadata () const
	Build the opaque metadata for all added columns. More...

Static Public Member Functions
static std::unique_ptr< chunked_pack >	create (cudf::table_view const &input, std::size_t user_buffer_size, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref temp_mr=cudf::get_current_device_resource_ref())
	Creates a `chunked_pack` instance to perform a "pack" of the `table_view` "input", where a buffer of `user_buffer_size` is filled with chunks of the overall operation. This operation can be used in cases where GPU memory is constrained. More...

Detailed Description

Perform a chunked "pack" operation of the input table_view using a user provided buffer of size user_buffer_size.

The intent of this operation is to be used in a streamed fashion at times of GPU out-of-memory, where we want to minimize the number of small cudaMemcpy calls and tracking of all the metadata associated with cudf tables. Because of the memory constraints, all thrust and scratch memory allocations are using the passed-in memory resource exclusively, not a per-device memory resource.

This class defines two methods that must be used in concert to carry out the chunked_pack: has_next and next. Here is an example:

// Create a table_view
cudf::table_view tv = ...;
 
// Choose a memory resource (optional). This memory resource is used for scratch/thrust temporary
// data. In memory constrained cases, this can be used to set aside scratch memory
// for `chunked_pack` at the beginning of a program.
auto mr = cudf::get_current_device_resource_ref();
 
// Define a buffer size for each chunk: the larger the buffer is, the more SMs can be
// occupied by this algorithm.
//
// Internally, the GPU unit of work is a 1MB batch. When we instantiate `cudf::chunked_pack`,
// all the 1MB batches for the source table_view are computed up front. Additionally,
// chunked_pack calculates the number of iterations that are required to go through all those
// batches given a `user_buffer_size` buffer. The number of 1MB batches in each iteration (chunk)
// equals the number of CUDA blocks that will be used for the main kernel launch.
//
std::size_t user_buffer_size = 128*1024*1024;
 
auto chunked_packer = cudf::chunked_pack::create(tv, user_buffer_size, mr);
 
std::size_t host_offset = 0;
auto host_buffer = ...; // obtain a host buffer you would like to copy to
 
while (chunked_packer->has_next()) {
  // get a user buffer of size `user_buffer_size`
  cudf::device_span<uint8_t> user_buffer = ...;
  std::size_t bytes_copied = chunked_packer->next(user_buffer);
 
  // buffer will hold the contents of at most `user_buffer_size` bytes
  // of the contiguously packed input `table_view`. You are now free to copy
  // this memory somewhere else, for example, to host.
  cudaMemcpyAsync(
    host_buffer.data() + host_offset,
    user_buffer.data(),
    bytes_copied,
    cudaMemcpyDefault,
    stream);
 
  host_offset += bytes_copied;
}

Definition at line 140 of file contiguous_split.hpp.

Constructor & Destructor Documentation

◆ chunked_pack()

cudf::chunked_pack::chunked_pack	(	cudf::table_view const &	input,
		std::size_t	user_buffer_size,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	temp_mr = `cudf::get_current_device_resource_ref()`
	)

explicit

Construct a chunked_pack class.

Parameters

input	source `table_view` to pack
user_buffer_size	buffer size (in bytes) that will be passed on `next`. Must be at least 1MB
stream	CUDA stream used for device memory operations and kernel launches
temp_mr	An optional memory resource to be used for temporary and scratch allocations only

Member Function Documentation

◆ build_metadata()

std::unique_ptr<std::vector<uint8_t> > cudf::chunked_pack::build_metadata ( ) const

Build the opaque metadata for all added columns.

Returns: A vector containing the serialized column metadata

◆ create()

static std::unique_ptr<chunked_pack> cudf::chunked_pack::create	(	cudf::table_view const &	input,
		std::size_t	user_buffer_size,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	temp_mr = `cudf::get_current_device_resource_ref()`
	)

static

Creates a chunked_pack instance to perform a "pack" of the table_view "input", where a buffer of user_buffer_size is filled with chunks of the overall operation. This operation can be used in cases where GPU memory is constrained.

The memory resource (temp_mr) could be a special memory resource to be used in situations when GPU memory is low and we want scratch and temporary allocations to happen from a small reserved pool of memory. Note that it defaults to the regular cuDF per-device resource.

Exceptions

cudf::logic_error When user_buffer_size is less than 1MB

Parameters

input	source `table_view` to pack
user_buffer_size	buffer size (in bytes) that will be passed on `next`. Must be at least 1MB
stream	CUDA stream used for device memory operations and kernel launches
temp_mr	RMM memory resource to be used for temporary and scratch allocations only

Returns: a unique_ptr of chunked_pack

◆ get_total_contiguous_size()

std::size_t cudf::chunked_pack::get_total_contiguous_size ( ) const

Obtain the total size of the contiguously packed table_view.

Returns: total size (in bytes) of all the chunks

◆ has_next()

bool cudf::chunked_pack::has_next ( ) const

Function to check if there are chunks left to be copied.

Returns: true if there are chunks left to be copied, and false otherwise

◆ next()

std::size_t cudf::chunked_pack::next ( cudf::device_span< uint8_t > const & user_buffer )

Packs the next chunk into user_buffer. This should be called as long as has_next returns true. If next is called when has_next is false, an exception is thrown.

Exceptions

cudf::logic_error	If the size of `user_buffer` is different than `user_buffer_size`
cudf::logic_error	If called after all chunks have been copied

Parameters

user_buffer device span target for the chunk. The size of this span must equal the user_buffer_size parameter passed at construction

Returns: The number of bytes that were written to user_buffer (at most user_buffer_size)

The documentation for this class was generated from the following file:

contiguous_split.hpp

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ chunked_pack()

Member Function Documentation

◆ build_metadata()

◆ create()

◆ get_total_contiguous_size()

◆ has_next()

◆ next()