Copy Split#
- group copy_split
Functions
-
std::vector<packed_table> contiguous_split(cudf::table_view const &input, std::vector<size_type> const &splits, rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Performs a deep-copy split of a
table_view
into a vector ofpacked_table
where eachpacked_table
is using a single contiguous block of memory for all of the split’s column data.The memory for the output views is allocated in a single contiguous
rmm::device_buffer
returned in thepacked_table
. There is no top-level owning table.The returned views of
input
are constructed from a vector of indices, that indicate where each split should occur. Thei
th returnedtable_view
is sliced as[0, splits[i])
ifi
=0, else[splits[i], input.size())
ifi
is the last view and[splits[i-1], splits[i]]
otherwise.For all
i
it is expectedsplits[i] <= splits[i+1] <= input.size()
. For asplits
size N, there will always be N+1 splits in the output.Example: input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28}, {50, 52, 54, 56, 58, 60, 62, 64, 66, 68}] splits: {2, 5, 9} output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}, {{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
Note
It is the caller’s responsibility to ensure that the returned views do not outlive the viewed device memory contained in the
all_data
field of the returned packed_table.- Throws:
std::out_of_range – if
splits
has end index > size ofinput
.std::out_of_range – When the value in
splits
is not in the range [0, input.size()).std::invalid_argument – When the values in the
splits
are ‘strictly decreasing’.
- Parameters:
input – View of a table to split
splits – A vector of indices where the view will be split
mr – An optional memory resource to use for all returned device allocations
- Returns:
The set of requested views of
input
indicated by thesplits
and the viewed memory buffer
-
packed_columns pack(cudf::table_view const &input, rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Deep-copy a
table_view
into a serialized contiguous memory format.The metadata from the
table_view
is copied into a host vector of bytes and the data from thetable_view
is copied into adevice_buffer
. Pass the output of this function intocudf::unpack
to deserialize.- Parameters:
input – View of the table to pack
mr – An optional memory resource to use for all returned device allocations
- Returns:
packed_columns A struct containing the serialized metadata and data in contiguous host and device memory respectively
-
std::vector<uint8_t> pack_metadata(table_view const &table, uint8_t const *contiguous_buffer, size_t buffer_size)#
Produce the metadata used for packing a table stored in a contiguous buffer.
The metadata from the
table_view
is copied into a host vector of bytes which can be used to construct apacked_columns
orpacked_table
structure. The caller is responsible for guaranteeing that all of the columns in the table point intocontiguous_buffer
.- Parameters:
table – View of the table to pack
contiguous_buffer – A contiguous buffer of device memory which contains the data referenced by the columns in
table
buffer_size – The size of
contiguous_buffer
- Returns:
Vector of bytes representing the metadata used to
unpack
a packed_columns struct
-
table_view unpack(packed_columns const &input)#
Deserialize the result of
cudf::pack
.Converts the result of a serialized table into a
table_view
that points to the data stored in the contiguous device buffer contained ininput
.It is the caller’s responsibility to ensure that the
table_view
in the output does not outlive the data in the input.No new device memory is allocated in this function.
- Parameters:
input – The packed columns to unpack
- Returns:
The unpacked
table_view
-
table_view unpack(uint8_t const *metadata, uint8_t const *gpu_data)#
Deserialize the result of
cudf::pack
.Converts the result of a serialized table into a
table_view
that points to the data stored in the contiguous device buffer contained ingpu_data
using the metadata contained in the host buffermetadata
.It is the caller’s responsibility to ensure that the
table_view
in the output does not outlive the data in the input.No new device memory is allocated in this function.
- Parameters:
metadata – The host-side metadata buffer resulting from the initial pack() call
gpu_data – The device-side contiguous buffer storing the data that will be referenced by the resulting
table_view
- Returns:
The unpacked
table_view
-
std::vector<column_view> split(column_view const &input, host_span<size_type const> splits, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Splits a
column_view
into a set ofcolumn_view
s according to a set of indices derived from expected splits.The returned view’s of
input
are constructed from vector of splits, which indicates where the split should occur. Thei
th returnedcolumn_view
is sliced as[0, splits[i])
ifi
=0, else[splits[i], input.size())
ifi
is the last view and[splits[i-1], splits[i]]
otherwise.For all
i
it is expectedsplits[i] <= splits[i+1] <= input.size()
For asplits
size N, there will always be N+1 splits in the outputExample: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} splits: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}
Note
It is the caller’s responsibility to ensure that the returned views do not outlive the viewed device memory.
- Throws:
std::out_of_range – if
splits
has end index > size ofinput
.std::out_of_range – When the value in
splits
is not in the range [0, input.size()).std::invalid_argument – When the values in the
splits
are ‘strictly decreasing’.
- Parameters:
input – View of column to split
splits – Indices where the view will be split
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
The set of requested views of
input
indicated by thesplits
-
std::vector<column_view> split(column_view const &input, std::initializer_list<size_type> splits, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Splits a
column_view
into a set ofcolumn_view
s according to a set of indices derived from expected splits.The returned view’s of
input
are constructed from vector of splits, which indicates where the split should occur. Thei
th returnedcolumn_view
is sliced as[0, splits[i])
ifi
=0, else[splits[i], input.size())
ifi
is the last view and[splits[i-1], splits[i]]
otherwise.For all
i
it is expectedsplits[i] <= splits[i+1] <= input.size()
For asplits
size N, there will always be N+1 splits in the outputExample: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} splits: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}
Note
It is the caller’s responsibility to ensure that the returned views do not outlive the viewed device memory.
- Throws:
std::out_of_range – if
splits
has end index > size ofinput
.std::out_of_range – When the value in
splits
is not in the range [0, input.size()).std::invalid_argument – When the values in the
splits
are ‘strictly decreasing’.
- Parameters:
input – View of column to split
splits – Indices where the view will be split
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
The set of requested views of
input
indicated by thesplits
-
std::vector<table_view> split(table_view const &input, host_span<size_type const> splits, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Splits a
table_view
into a set oftable_view
s according to a set of indices derived from expected splits.The returned views of
input
are constructed from vector of splits, which indicates where the split should occur. Thei
th returnedtable_view
is sliced as[0, splits[i])
ifi
=0, else[splits[i], input.size())
ifi
is the last view and[splits[i-1], splits[i]]
otherwise.For all
i
it is expectedsplits[i] <= splits[i+1] <= input.size()
For asplits
size N, there will always be N+1 splits in the outputExample: input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28}, {50, 52, 54, 56, 58, 60, 62, 64, 66, 68}] splits: {2, 5, 9} output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}, {{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
Note
It is the caller’s responsibility to ensure that the returned views do not outlive the viewed device memory.
- Throws:
std::out_of_range – if
splits
has end index > size ofinput
.std::out_of_range – When the value in
splits
is not in the range [0, input.size()).std::invalid_argument – When the values in the
splits
are ‘strictly decreasing’.
- Parameters:
input – View of a table to split
splits – Indices where the view will be split
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
The set of requested views of
input
indicated by thesplits
-
std::vector<table_view> split(table_view const &input, std::initializer_list<size_type> splits, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Splits a
table_view
into a set oftable_view
s according to a set of indices derived from expected splits.The returned views of
input
are constructed from vector of splits, which indicates where the split should occur. Thei
th returnedtable_view
is sliced as[0, splits[i])
ifi
=0, else[splits[i], input.size())
ifi
is the last view and[splits[i-1], splits[i]]
otherwise.For all
i
it is expectedsplits[i] <= splits[i+1] <= input.size()
For asplits
size N, there will always be N+1 splits in the outputExample: input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28}, {50, 52, 54, 56, 58, 60, 62, 64, 66, 68}] splits: {2, 5, 9} output: [{{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}, {{50, 52}, {54, 56, 58}, {60, 62, 64, 66}, {68}}]
Note
It is the caller’s responsibility to ensure that the returned views do not outlive the viewed device memory.
- Throws:
std::out_of_range – if
splits
has end index > size ofinput
.std::out_of_range – When the value in
splits
is not in the range [0, input.size()).std::invalid_argument – When the values in the
splits
are ‘strictly decreasing’.
- Parameters:
input – View of a table to split
splits – Indices where the view will be split
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
The set of requested views of
input
indicated by thesplits
-
struct packed_columns#
- #include <contiguous_split.hpp>
Column data in a serialized format.
Contains data from an array of columns in two contiguous buffers: one on host, which contains table metadata and one on device which contains the table data.
Public Functions
-
inline packed_columns(std::unique_ptr<std::vector<uint8_t>> &&md, std::unique_ptr<rmm::device_buffer> &&gd)#
Construct a new packed columns object.
- Parameters:
md – Host-side metadata buffer
gd – Device-side data buffer
Public Members
-
std::unique_ptr<std::vector<uint8_t>> metadata#
Host-side metadata buffer.
-
std::unique_ptr<rmm::device_buffer> gpu_data#
Device-side data buffer.
-
inline packed_columns(std::unique_ptr<std::vector<uint8_t>> &&md, std::unique_ptr<rmm::device_buffer> &&gd)#
-
struct packed_table#
- #include <contiguous_split.hpp>
The result(s) of a cudf::contiguous_split.
Each table_view resulting from a split operation performed by contiguous_split, will be returned wrapped in a
packed_table
. The table_view and internal column_views in this struct are not owned by a top level cudf::table or cudf::column. The backing memory and metadata is instead owned by thedata
field and is in one contiguous block.The user is responsible for assuring that the
table
or any derived table_views do not outlive the memory owned bydata
.Public Members
-
cudf::table_view table#
Result table_view of a cudf::contiguous_split.
-
packed_columns data#
Column data owned.
-
cudf::table_view table#
-
class chunked_pack#
- #include <contiguous_split.hpp>
Perform a chunked “pack” operation of the input
table_view
using a user provided buffer of sizeuser_buffer_size
.The intent of this operation is to be used in a streamed fashion at times of GPU out-of-memory, where we want to minimize the number of small cudaMemcpy calls and tracking of all the metadata associated with cudf tables. Because of the memory constraints, all thrust and scratch memory allocations are using the passed-in memory resource exclusively, not a per-device memory resource.
This class defines two methods that must be used in concert to carry out the chunked_pack: has_next and next. Here is an example:
// Create a table_view cudf::table_view tv = ...; // Choose a memory resource (optional). This memory resource is used for scratch/thrust temporary // data. In memory constrained cases, this can be used to set aside scratch memory // for `chunked_pack` at the beginning of a program. auto mr = cudf::get_current_device_resource_ref(); // Define a buffer size for each chunk: the larger the buffer is, the more SMs can be // occupied by this algorithm. // // Internally, the GPU unit of work is a 1MB batch. When we instantiate `cudf::chunked_pack`, // all the 1MB batches for the source table_view are computed up front. Additionally, // chunked_pack calculates the number of iterations that are required to go through all those // batches given a `user_buffer_size` buffer. The number of 1MB batches in each iteration (chunk) // equals the number of CUDA blocks that will be used for the main kernel launch. // std::size_t user_buffer_size = 128*1024*1024; auto chunked_packer = cudf::chunked_pack::create(tv, user_buffer_size, mr); std::size_t host_offset = 0; auto host_buffer = ...; // obtain a host buffer you would like to copy to while (chunked_packer->has_next()) { // get a user buffer of size `user_buffer_size` cudf::device_span<uint8_t> user_buffer = ...; std::size_t bytes_copied = chunked_packer->next(user_buffer); // buffer will hold the contents of at most `user_buffer_size` bytes // of the contiguously packed input `table_view`. You are now free to copy // this memory somewhere else, for example, to host. cudaMemcpyAsync( host_buffer.data() + host_offset, user_buffer.data(), bytes_copied, cudaMemcpyDefault, stream); host_offset += bytes_copied; }
Public Functions
-
explicit chunked_pack(cudf::table_view const &input, std::size_t user_buffer_size, rmm::device_async_resource_ref temp_mr = cudf::get_current_device_resource_ref())#
Construct a
chunked_pack
class.- Parameters:
input – source
table_view
to packuser_buffer_size – buffer size (in bytes) that will be passed on
next
. Must be at least 1MBtemp_mr – An optional memory resource to be used for temporary and scratch allocations only
-
~chunked_pack()#
Destructor that will be implemented as default. Declared with definition here because contiguous_split_state is incomplete at this stage.
-
std::size_t get_total_contiguous_size() const#
Obtain the total size of the contiguously packed
table_view
.- Returns:
total size (in bytes) of all the chunks
-
bool has_next() const#
Function to check if there are chunks left to be copied.
- Returns:
true if there are chunks left to be copied, and false otherwise
-
std::size_t next(cudf::device_span<uint8_t> const &user_buffer)#
Packs the next chunk into
user_buffer
. This should be called as long ashas_next
returns true. Ifnext
is called whenhas_next
is false, an exception is thrown.- Throws:
cudf::logic_error – If the size of
user_buffer
is different thanuser_buffer_size
cudf::logic_error – If called after all chunks have been copied
- Parameters:
user_buffer – device span target for the chunk. The size of this span must equal the
user_buffer_size
parameter passed at construction- Returns:
The number of bytes that were written to
user_buffer
(at mostuser_buffer_size
)
-
std::unique_ptr<std::vector<uint8_t>> build_metadata() const#
Build the opaque metadata for all added columns.
- Returns:
A vector containing the serialized column metadata
Public Static Functions
-
static std::unique_ptr<chunked_pack> create(cudf::table_view const &input, std::size_t user_buffer_size, rmm::device_async_resource_ref temp_mr = cudf::get_current_device_resource_ref())#
Creates a
chunked_pack
instance to perform a “pack” of thetable_view
“input”, where a buffer ofuser_buffer_size
is filled with chunks of the overall operation. This operation can be used in cases where GPU memory is constrained.The memory resource (
temp_mr
) could be a special memory resource to be used in situations when GPU memory is low and we want scratch and temporary allocations to happen from a small reserved pool of memory. Note that it defaults to the regular cuDF per-device resource.- Throws:
cudf::logic_error – When user_buffer_size is less than 1MB
- Parameters:
input – source
table_view
to packuser_buffer_size – buffer size (in bytes) that will be passed on
next
. Must be at least 1MBtemp_mr – RMM memory resource to be used for temporary and scratch allocations only
- Returns:
a unique_ptr of chunked_pack
-
explicit chunked_pack(cudf::table_view const &input, std::size_t user_buffer_size, rmm::device_async_resource_ref temp_mr = cudf::get_current_device_resource_ref())#
-
std::vector<packed_table> contiguous_split(cudf::table_view const &input, std::vector<size_type> const &splits, rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#