Files | |
file | encode.hpp |
Dictionary column encode and decode APIs. | |
Functions | |
std::unique_ptr< column > | cudf::dictionary::encode (column_view const &column, data_type indices_type=data_type{type_id::UINT32}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Construct a dictionary column by dictionary encoding an existing column. More... | |
std::unique_ptr< column > | cudf::dictionary::decode (dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column. More... | |
std::unique_ptr<column> cudf::dictionary::decode | ( | dictionary_column_view const & | dictionary_column, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column.
dictionary_column | Existing dictionary column |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::dictionary::encode | ( | column_view const & | column, |
data_type | indices_type = data_type{type_id::UINT32} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Construct a dictionary column by dictionary encoding an existing column.
The output column is a DICTIONARY type with a keys column of non-null, unique values that are in a strict, total order. Meaning, keys[i]
is _ordered before keys[i+1]
for all i in [0,n-1)
where n
is the number of keys.
The output column has a child indices column that is of integer type and with the same size as the input column.
The null mask and null count are copied from the input column to the output column.
cudf::logic_error | if indices type is not an unsigned integer type |
cudf::logic_error | if the column to encode is already a DICTIONARY type |
column | The column to dictionary encode |
indices_type | The integer type to use for the indices |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |