Files | |
| file | encode.hpp |
| Dictionary column encode and decode APIs. | |
Functions | |
| std::unique_ptr< column > | cudf::dictionary::encode (column_view const &column, data_type indices_type=data_type{type_id::INT32}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Construct a dictionary column by dictionary encoding an existing column. More... | |
| std::unique_ptr< column > | cudf::dictionary::decode (dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
| Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column. More... | |
| std::unique_ptr<column> cudf::dictionary::decode | ( | dictionary_column_view const & | dictionary_column, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) |
Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column.
| dictionary_column | Existing dictionary column |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |
| std::unique_ptr<column> cudf::dictionary::encode | ( | column_view const & | column, |
| data_type | indices_type = data_type{type_id::INT32}, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) |
Construct a dictionary column by dictionary encoding an existing column.
The output column is a DICTIONARY type with a keys column of non-null, unique values that are in a strict, total order. Meaning, keys[i] is _ordered before keys[i+1] for all i in [0,n-1) where n is the number of keys.
The output column has a child indices column that is of integer type and with the same size as the input column. The indices column will be of type indices_type. The result is undefined if the indices_type is not large enough for the indices values.
The null mask and null count are copied from the input column to the output column.
| std::invalid_argument | if indices type is not a signed integer type |
| std::invalid_argument | if the column to encode is already a DICTIONARY type |
| column | The column to dictionary encode |
| indices_type | The integer type to use for the indices |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |