Files | |
file | transform.hpp |
Column APIs for transforming rows. | |
Functions | |
std::unique_ptr< column > | cudf::transform (column_view const &input, std::string const &unary_udf, data_type output_type, bool is_ptx, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Creates a new column by applying a unary function against every element of an input column. More... | |
std::pair< std::unique_ptr< rmm::device_buffer >, size_type > | cudf::nans_to_nulls (column_view const &input, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Creates a null_mask from input by converting NaN to null and preserving existing null values and also returns new null_count. More... | |
std::unique_ptr< column > | cudf::compute_column (table_view const &table, ast::expression const &expr, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Compute a new column by evaluating an expression tree on a table. More... | |
std::pair< std::unique_ptr< rmm::device_buffer >, cudf::size_type > | cudf::bools_to_mask (column_view const &input, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Creates a bitmask from a column of boolean elements. More... | |
std::pair< std::unique_ptr< cudf::table >, std::unique_ptr< cudf::column > > | cudf::encode (cudf::table_view const &input, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Encode the rows of the given table as integers. More... | |
std::pair< std::unique_ptr< column >, table_view > | cudf::one_hot_encode (column_view const &input, column_view const &categories, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Encodes input by generating a new column for each value in categories indicating the presence of that value in input . More... | |
std::unique_ptr< column > | cudf::mask_to_bools (bitmask_type const *bitmask, size_type begin_bit, size_type end_bit, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Creates a boolean column from given bitmask. More... | |
std::unique_ptr< column > | cudf::row_bit_count (table_view const &t, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Returns an approximate cumulative size in bits of all columns in the table_view for each row. More... | |
std::unique_ptr< column > | cudf::segmented_row_bit_count (table_view const &t, size_type segment_length, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Returns an approximate cumulative size in bits of all columns in the table_view for each segment of rows. More... | |
std::pair<std::unique_ptr<rmm::device_buffer>, cudf::size_type> cudf::bools_to_mask | ( | column_view const & | input, |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Creates a bitmask from a column of boolean elements.
If element i
in input
is true
, bit i
in the resulting mask is set (1
). Else, if element i
is false
or null, bit i
is unset (0
).
cudf::logic_error | if input.type() is a non-boolean type |
input | Boolean elements to convert to a bitmask |
mr | Device memory resource used to allocate the returned bitmask |
device_buffer
with the new bitmask and it's null count obtained from input considering true
represent valid
/1
and false
represent invalid
/0
. std::unique_ptr<column> cudf::compute_column | ( | table_view const & | table, |
ast::expression const & | expr, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Compute a new column by evaluating an expression tree on a table.
This evaluates an expression over a table to produce a new column. Also called an n-ary transform.
cudf::logic_error | if passed an expression operating on table_reference::RIGHT. |
table | The table used for expression evaluation |
expr | The root of the expression tree |
mr | Device memory resource |
std::pair<std::unique_ptr<cudf::table>, std::unique_ptr<cudf::column> > cudf::encode | ( | cudf::table_view const & | input, |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Encode the rows of the given table as integers.
The encoded values are integers in the range [0, n), where n
is the number of distinct rows in the input table. The result table is such that keys[result[i]] == input[i]
, where keys
is a table containing the distinct rows in input
in sorted ascending order. Nulls, if any, are sorted to the end of the keys
table.
Examples:
input | Table containing values to be encoded |
mr | Device memory resource used to allocate the returned table's device memory |
std::unique_ptr<column> cudf::mask_to_bools | ( | bitmask_type const * | bitmask, |
size_type | begin_bit, | ||
size_type | end_bit, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Creates a boolean column from given bitmask.
Returns a bool
for each bit in [begin_bit, end_bit)
. If bit i
in least-significant bit numbering is set (1), then element i
in the output is true
, otherwise false
.
cudf::logic_error | if bitmask is null and end_bit-begin_bit > 0 |
cudf::logic_error | if begin_bit > end_bit |
Examples:
bitmask | A device pointer to the bitmask which needs to be converted |
begin_bit | position of the bit from which the conversion should start |
end_bit | position of the bit before which the conversion should stop |
mr | Device memory resource used to allocate the returned columns' device memory |
std::pair<std::unique_ptr<rmm::device_buffer>, size_type> cudf::nans_to_nulls | ( | column_view const & | input, |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Creates a null_mask from input
by converting NaN
to null and preserving existing null values and also returns new null_count.
cudf::logic_error | if input.type() is a non-floating type |
input | An immutable view of the input column of floating-point type |
mr | Device memory resource used to allocate the returned bitmask |
device_buffer
with the new bitmask and it's null count obtained by replacing NaN
in input
with null. std::pair<std::unique_ptr<column>, table_view> cudf::one_hot_encode | ( | column_view const & | input, |
column_view const & | categories, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Encodes input
by generating a new column for each value in categories
indicating the presence of that value in input
.
The resulting per-category columns are returned concatenated as a single column viewed by a table_view
.
The i
th row of the j
th column in the output table equals 1 if input[i] == categories[j]
, and 0 otherwise.
The i
th row of the j
th column in the output table equals 1 if input[i] == categories[j], and 0 otherwise.
Examples:
cudf::logic_error | if input and categories are of different types. |
input | Column containing values to be encoded |
categories | Column containing categories |
mr | Device memory resource used to allocate the returned table's device memory |
std::unique_ptr<column> cudf::row_bit_count | ( | table_view const & | t, |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Returns an approximate cumulative size in bits of all columns in the table_view
for each row.
This function counts bits instead of bytes to account for the null mask which only has one bit per row.
Each row in the returned column is the sum of the per-row size for each column in the table.
In some cases, this is an inexact approximation. Specifically, columns of lists and strings require N+1 offsets to represent N rows. It is up to the caller to calculate the small additional overhead of the terminating offset for any group of rows being considered.
This function returns the per-row sizes as the columns are currently formed. This can end up being larger than the number you would get by gathering the rows. Specifically, the push-down of struct column validity masks can nullify rows that contain data for string or list columns. In these cases, the size returned is conservative:
row_bit_count(column(x)) >= row_bit_count(gather(column(x)))
t | The table view to perform the computation on |
mr | Device memory resource used to allocate the returned columns' device memory |
std::unique_ptr<column> cudf::segmented_row_bit_count | ( | table_view const & | t, |
size_type | segment_length, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Returns an approximate cumulative size in bits of all columns in the table_view
for each segment of rows.
This is similar to counting bit size per row for the input table in cudf::row_bit_count
, except that row sizes are accumulated by segments.
Currently, only fixed-length segments are supported. In case the input table has number of rows not divisible by segment_length
, its last segment is considered as shorter than the others.
std::invalid_argument | if the input segment_length is non-positive or larger than the number of rows in the input table. |
t | The table view to perform the computation on |
segment_length | The number of rows in each segment for which the total size is computed |
mr | Device memory resource used to allocate the returned columns' device memory |
std::unique_ptr<column> cudf::transform | ( | column_view const & | input, |
std::string const & | unary_udf, | ||
data_type | output_type, | ||
bool | is_ptx, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Creates a new column by applying a unary function against every element of an input column.
Computes: out[i] = F(in[i])
The output null mask is the same is the input null mask so if input[i] is null then output[i] is also null
input | An immutable view of the input column to transform |
unary_udf | The PTX/CUDA string of the unary function to apply |
output_type | The output type that is compatible with the output type in the UDF |
is_ptx | true: the UDF is treated as PTX code; false: the UDF is treated as CUDA code |
mr | Device memory resource used to allocate the returned column's device memory |