Public Member Functions | Static Public Member Functions | List of all members
cudf::chunked_pack Class Reference

Perform a chunked "pack" operation of the input table_view using a user provided buffer of size user_buffer_size. More...

#include <contiguous_split.hpp>

Public Member Functions

 chunked_pack (cudf::table_view const &input, std::size_t user_buffer_size, rmm::device_async_resource_ref temp_mr=rmm::mr::get_current_device_resource())
 Construct a chunked_pack class. More...
 
 ~chunked_pack ()
 Destructor that will be implemented as default. Declared with definition here because contiguous_split_state is incomplete at this stage.
 
std::size_t get_total_contiguous_size () const
 Obtain the total size of the contiguously packed table_view. More...
 
bool has_next () const
 Function to check if there are chunks left to be copied. More...
 
std::size_t next (cudf::device_span< uint8_t > const &user_buffer)
 Packs the next chunk into user_buffer. This should be called as long as has_next returns true. If next is called when has_next is false, an exception is thrown. More...
 
std::unique_ptr< std::vector< uint8_t > > build_metadata () const
 Build the opaque metadata for all added columns. More...
 

Static Public Member Functions

static std::unique_ptr< chunked_packcreate (cudf::table_view const &input, std::size_t user_buffer_size, rmm::device_async_resource_ref temp_mr=rmm::mr::get_current_device_resource())
 Creates a chunked_pack instance to perform a "pack" of the table_view "input", where a buffer of user_buffer_size is filled with chunks of the overall operation. This operation can be used in cases where GPU memory is constrained. More...
 

Detailed Description

Perform a chunked "pack" operation of the input table_view using a user provided buffer of size user_buffer_size.

The intent of this operation is to be used in a streamed fashion at times of GPU out-of-memory, where we want to minimize the number of small cudaMemcpy calls and tracking of all the metadata associated with cudf tables. Because of the memory constraints, all thrust and scratch memory allocations are using the passed-in memory resource exclusively, not a per-device memory resource.

This class defines two methods that must be used in concert to carry out the chunked_pack: has_next and next. Here is an example:

// Create a table_view
cudf::table_view tv = ...;
// Choose a memory resource (optional). This memory resource is used for scratch/thrust temporary
// data. In memory constrained cases, this can be used to set aside scratch memory
// for `chunked_pack` at the beginning of a program.
auto mr = rmm::mr::get_current_device_resource();
// Define a buffer size for each chunk: the larger the buffer is, the more SMs can be
// occupied by this algorithm.
//
// Internally, the GPU unit of work is a 1MB batch. When we instantiate `cudf::chunked_pack`,
// all the 1MB batches for the source table_view are computed up front. Additionally,
// chunked_pack calculates the number of iterations that are required to go through all those
// batches given a `user_buffer_size` buffer. The number of 1MB batches in each iteration (chunk)
// equals the number of CUDA blocks that will be used for the main kernel launch.
//
std::size_t user_buffer_size = 128*1024*1024;
auto chunked_packer = cudf::chunked_pack::create(tv, user_buffer_size, mr);
std::size_t host_offset = 0;
auto host_buffer = ...; // obtain a host buffer you would like to copy to
while (chunked_packer->has_next()) {
// get a user buffer of size `user_buffer_size`
cudf::device_span<uint8_t> user_buffer = ...;
std::size_t bytes_copied = chunked_packer->next(user_buffer);
// buffer will hold the contents of at most `user_buffer_size` bytes
// of the contiguously packed input `table_view`. You are now free to copy
// this memory somewhere else, for example, to host.
cudaMemcpyAsync(
host_buffer.data() + host_offset,
user_buffer.data(),
bytes_copied,
cudaMemcpyDefault,
stream);
host_offset += bytes_copied;
}

Definition at line 194 of file contiguous_split.hpp.

Constructor & Destructor Documentation

◆ chunked_pack()

cudf::chunked_pack::chunked_pack ( cudf::table_view const &  input,
std::size_t  user_buffer_size,
rmm::device_async_resource_ref  temp_mr = rmm::mr::get_current_device_resource() 
)
explicit

Construct a chunked_pack class.

Parameters
inputsource table_view to pack
user_buffer_sizebuffer size (in bytes) that will be passed on next. Must be at least 1MB
temp_mrAn optional memory resource to be used for temporary and scratch allocations only

Member Function Documentation

◆ build_metadata()

std::unique_ptr<std::vector<uint8_t> > cudf::chunked_pack::build_metadata ( ) const

Build the opaque metadata for all added columns.

Returns
A vector containing the serialized column metadata

◆ create()

static std::unique_ptr<chunked_pack> cudf::chunked_pack::create ( cudf::table_view const &  input,
std::size_t  user_buffer_size,
rmm::device_async_resource_ref  temp_mr = rmm::mr::get_current_device_resource() 
)
static

Creates a chunked_pack instance to perform a "pack" of the table_view "input", where a buffer of user_buffer_size is filled with chunks of the overall operation. This operation can be used in cases where GPU memory is constrained.

The memory resource (temp_mr) could be a special memory resource to be used in situations when GPU memory is low and we want scratch and temporary allocations to happen from a small reserved pool of memory. Note that it defaults to the regular cuDF per-device resource.

Exceptions
cudf::logic_errorWhen user_buffer_size is less than 1MB
Parameters
inputsource table_view to pack
user_buffer_sizebuffer size (in bytes) that will be passed on next. Must be at least 1MB
temp_mrRMM memory resource to be used for temporary and scratch allocations only
Returns
a unique_ptr of chunked_pack

◆ get_total_contiguous_size()

std::size_t cudf::chunked_pack::get_total_contiguous_size ( ) const

Obtain the total size of the contiguously packed table_view.

Returns
total size (in bytes) of all the chunks

◆ has_next()

bool cudf::chunked_pack::has_next ( ) const

Function to check if there are chunks left to be copied.

Returns
true if there are chunks left to be copied, and false otherwise

◆ next()

std::size_t cudf::chunked_pack::next ( cudf::device_span< uint8_t > const &  user_buffer)

Packs the next chunk into user_buffer. This should be called as long as has_next returns true. If next is called when has_next is false, an exception is thrown.

Exceptions
cudf::logic_errorIf the size of user_buffer is different than user_buffer_size
cudf::logic_errorIf called after all chunks have been copied
Parameters
user_bufferdevice span target for the chunk. The size of this span must equal the user_buffer_size parameter passed at construction
Returns
The number of bytes that were written to user_buffer (at most user_buffer_size)

The documentation for this class was generated from the following file: