Column Reshape#
- group Reshaping
Enums
Functions
-
std::unique_ptr<table> explode(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Explodes a list column’s elements.
Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. Example:
[[5,10,15], 100], [[20,25], 200], [[30], 300], returns [5, 100], [10, 100], [15, 100], [20, 200], [25, 200], [30, 300],
Nulls and empty lists propagate in different ways depending on what is null or empty.
Note that null lists are not included in the resulting table, but nulls inside lists and empty lists will be represented with a null entry for that column in that row.[[5,null,15], 100], [null, 200], [[], 300], returns [5, 100], [null, 100], [15, 100],
- Parameters:
input_table – Table to explode.
explode_column_idx – Column index to explode inside the table.
stream – CUDA stream used for device memory operations and kernel launches.
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
A new table with explode_col exploded.
-
std::unique_ptr<table> explode_position(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Explodes a list column’s elements and includes a position column.
Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example:
[[5,10,15], 100], [[20,25], 200], [[30], 300], returns [0, 5, 100], [1, 10, 100], [2, 15, 100], [0, 20, 200], [1, 25, 200], [0, 30, 300],
Nulls and empty lists propagate in different ways depending on what is null or empty.
Note that null lists are not included in the resulting table, but nulls inside lists and empty lists will be represented with a null entry for that column in that row.[[5,null,15], 100], [null, 200], [[], 300], returns [0, 5, 100], [1, null, 100], [2, 15, 100],
- Parameters:
input_table – Table to explode.
explode_column_idx – Column index to explode inside the table.
stream – CUDA stream used for device memory operations and kernel launches.
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
A new table with exploded value and position. The column order of return table is [cols before explode_input, explode_position, explode_value, cols after explode_input].
-
std::unique_ptr<table> explode_outer(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Explodes a list column’s elements retaining any null entries or empty lists inside.
Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. Example:
[[5,10,15], 100], [[20,25], 200], [[30], 300], returns [5, 100], [10, 100], [15, 100], [20, 200], [25, 200], [30, 300],
Nulls and empty lists propagate as null entries in the result.
[[5,null,15], 100], [null, 200], [[], 300], returns [5, 100], [null, 100], [15, 100], [null, 200], [null, 300],
- Parameters:
input_table – Table to explode.
explode_column_idx – Column index to explode inside the table.
stream – CUDA stream used for device memory operations and kernel launches.
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
A new table with explode_col exploded.
-
std::unique_ptr<table> explode_outer_position(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Explodes a list column’s elements retaining any null entries or empty lists and includes a position column.
Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example:
[[5,10,15], 100], [[20,25], 200], [[30], 300], returns [0, 5, 100], [1, 10, 100], [2, 15, 100], [0, 20, 200], [1, 25, 200], [0, 30, 300],
Nulls and empty lists propagate as null entries in the result.
[[5,null,15], 100], [null, 200], [[], 300], returns [0, 5, 100], [1, null, 100], [2, 15, 100], [0, null, 200], [0, null, 300],
- Parameters:
input_table – Table to explode.
explode_column_idx – Column index to explode inside the table.
stream – CUDA stream used for device memory operations and kernel launches.
mr – Device memory resource used to allocate the returned column’s device memory.
- Returns:
A new table with explode_col exploded.
-
std::unique_ptr<column> interleave_columns(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Interleave columns of a table into a single column.
Converts the column major table
input
into a row major column. Example:in = [[A1, A2, A3], [B1, B2, B3]] return = [A1, B1, A2, B2, A3, B3]
- Throws:
cudf::logic_error – if input contains no columns.
cudf::logic_error – if input columns dtypes are not identical.
- Parameters:
input – Table containing columns to interleave
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
The interleaved columns as a single column
-
std::unique_ptr<table> tile(table_view const &input, size_type count, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Repeats the rows from
input
tablecount
times to form a new table.output.num_columns() == input.num_columns()
output.num_rows() == input.num_rows() * count
input = [[8, 4, 7], [5, 2, 3]] count = 2 return = [[8, 4, 7, 8, 4, 7], [5, 2, 3, 5, 2, 3]]
- Parameters:
input – Table containing rows to be repeated
count – Number of times to tile “rows”. Must be non-negative
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
The table containing the tiled “rows”
-
std::unique_ptr<column> byte_cast(column_view const &input_column, flip_endianness endian_configuration, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Converts a column’s elements to lists of bytes.
input<int32> = [8675, 309] configuration = flip_endianness::YES return = [[0x00, 0x00, 0x21, 0xe3], [0x00, 0x00, 0x01, 0x35]]
- Parameters:
input_column – Column to be converted to lists of bytes
endian_configuration – Whether to retain or flip the endianness of the elements
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
The column containing the lists of bytes
-
void table_to_array(table_view const &input, device_span<cuda::std::byte> output, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Copies a table into a contiguous column-major device array.
This function copies a
table_view
with columns of the same fixed-width type into a 2D device array stored in column-major order.The output buffer must be preallocated and passed as a
device_span
using adevice_span<cuda::std::byte>
. It must be large enough to holdnum_rows * num_columns * sizeof(dtype)
bytes.- Throws:
cudf::logic_error – if columns do not all have the same type
cudf::logic_error – if the dtype of the columns is not a fixed-width type
std::invalid_argument – if the output span is too small
- Parameters:
input – A table with fixed-width, non-nullable columns of the same type
output – A span representing preallocated device memory for the output
stream – CUDA stream used for memory operations
-
std::unique_ptr<table> explode(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Contents: