Column Reshape#

group column_reshape

Enums

enum class flip_endianness : bool#

Configures whether byte casting flips endianness.

Values:

enumerator NO#
enumerator YES#

Functions

std::unique_ptr<table> explode(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Explodes a list column’s elements.

Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. Example:

[[5,10,15], 100],
[[20,25],   200],
[[30],      300],
returns
[5,         100],
[10,        100],
[15,        100],
[20,        200],
[25,        200],
[30,        300],

Nulls and empty lists propagate in different ways depending on what is null or empty.

[[5,null,15], 100],
[null,        200],
[[],          300],
returns
[5,           100],
[null,        100],
[15,          100],
Note that null lists are not included in the resulting table, but nulls inside lists and empty lists will be represented with a null entry for that column in that row.

Parameters:
  • input_table – Table to explode.

  • explode_column_idx – Column index to explode inside the table.

  • stream – CUDA stream used for device memory operations and kernel launches.

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

A new table with explode_col exploded.

std::unique_ptr<table> explode_position(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Explodes a list column’s elements and includes a position column.

Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example:

[[5,10,15], 100],
[[20,25],   200],
[[30],      300],
returns
[0,   5,     100],
[1,   10,    100],
[2,   15,    100],
[0,   20,    200],
[1,   25,    200],
[0,   30,    300],

Nulls and empty lists propagate in different ways depending on what is null or empty.

[[5,null,15], 100],
[null,        200],
[[],          300],
returns
[0,     5,    100],
[1,  null,    100],
[2,    15,    100],
Note that null lists are not included in the resulting table, but nulls inside lists and empty lists will be represented with a null entry for that column in that row.

Parameters:
  • input_table – Table to explode.

  • explode_column_idx – Column index to explode inside the table.

  • stream – CUDA stream used for device memory operations and kernel launches.

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

A new table with exploded value and position. The column order of return table is [cols before explode_input, explode_position, explode_value, cols after explode_input].

std::unique_ptr<table> explode_outer(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Explodes a list column’s elements retaining any null entries or empty lists inside.

Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. Example:

[[5,10,15], 100],
[[20,25],   200],
[[30],      300],
returns
[5,         100],
[10,        100],
[15,        100],
[20,        200],
[25,        200],
[30,        300],

Nulls and empty lists propagate as null entries in the result.

[[5,null,15], 100],
[null,        200],
[[],          300],
returns
[5,           100],
[null,        100],
[15,          100],
[null,        200],
[null,        300],

Parameters:
  • input_table – Table to explode.

  • explode_column_idx – Column index to explode inside the table.

  • stream – CUDA stream used for device memory operations and kernel launches.

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

A new table with explode_col exploded.

std::unique_ptr<table> explode_outer_position(table_view const &input_table, size_type explode_column_idx, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Explodes a list column’s elements retaining any null entries or empty lists and includes a position column.

Any list is exploded, which means the elements of the list in each row are expanded into new rows in the output. The corresponding rows for other columns in the input are duplicated. A position column is added that has the index inside the original list for each row. Example:

[[5,10,15], 100],
[[20,25],   200],
[[30],      300],
returns
[0,   5,    100],
[1,  10,    100],
[2,  15,    100],
[0,  20,    200],
[1,  25,    200],
[0,  30,    300],

Nulls and empty lists propagate as null entries in the result.

[[5,null,15], 100],
[null,        200],
[[],          300],
returns
[0,     5,    100],
[1,  null,    100],
[2,    15,    100],
[0,  null,    200],
[0,  null,    300],

Parameters:
  • input_table – Table to explode.

  • explode_column_idx – Column index to explode inside the table.

  • stream – CUDA stream used for device memory operations and kernel launches.

  • mr – Device memory resource used to allocate the returned column’s device memory.

Returns:

A new table with explode_col exploded.

std::unique_ptr<column> interleave_columns(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Interleave columns of a table into a single column.

Converts the column major table input into a row major column. Example:

in     = [[A1, A2, A3], [B1, B2, B3]]
return = [A1, B1, A2, B2, A3, B3]

Throws:
Parameters:
  • input – Table containing columns to interleave

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

The interleaved columns as a single column

std::unique_ptr<table> tile(table_view const &input, size_type count, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Repeats the rows from input table count times to form a new table.

output.num_columns() == input.num_columns() output.num_rows() == input.num_rows() * count

input  = [[8, 4, 7], [5, 2, 3]]
count  = 2
return = [[8, 4, 7, 8, 4, 7], [5, 2, 3, 5, 2, 3]]
Parameters:
  • input – Table containing rows to be repeated

  • count – Number of times to tile “rows”. Must be non-negative

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

The table containing the tiled “rows”

std::unique_ptr<column> byte_cast(column_view const &input_column, flip_endianness endian_configuration, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Converts a column’s elements to lists of bytes.

input<int32>  = [8675, 309]
configuration = flip_endianness::YES
return        = [[0x00, 0x00, 0x21, 0xe3], [0x00, 0x00, 0x01, 0x35]]
Parameters:
  • input_column – Column to be converted to lists of bytes

  • endian_configuration – Whether to retain or flip the endianness of the elements

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

The column containing the lists of bytes

Contents: