Files | Functions
Encoding

Files

file  encode.hpp
 Dictionary column encode and decode APIs.
 

Functions

std::unique_ptr< columncudf::dictionary::encode (column_view const &column, data_type indices_type=data_type{type_id::INT32}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Construct a dictionary column by dictionary encoding an existing column. More...
 
std::unique_ptr< columncudf::dictionary::decode (dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column. More...
 

Detailed Description

Function Documentation

◆ decode()

std::unique_ptr<column> cudf::dictionary::decode ( dictionary_column_view const &  dictionary_column,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column.

d1 = {["a", "c", "d"], [2, 0, 1, 0]}
s = decode(d1)
s is now ["d", "a", "c", "a"]
Parameters
dictionary_columnExisting dictionary column
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with type matching the dictionary_column's keys

◆ encode()

std::unique_ptr<column> cudf::dictionary::encode ( column_view const &  column,
data_type  indices_type = data_type{type_id::INT32},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Construct a dictionary column by dictionary encoding an existing column.

The output column is a DICTIONARY type with a keys column of non-null, unique values that are in a strict, total order. Meaning, keys[i] is _ordered before keys[i+1] for all i in [0,n-1) where n is the number of keys.

The output column has a child indices column that is of integer type and with the same size as the input column.

The null mask and null count are copied from the input column to the output column.

Exceptions
cudf::logic_errorif indices type is not a signed integer type
cudf::logic_errorif the column to encode is already a DICTIONARY type
c = [429, 111, 213, 111, 213, 429, 213]
d = encode(c)
d now has keys [111, 213, 429] and indices [2, 0, 1, 0, 1, 2, 1]
Parameters
columnThe column to dictionary encode
indices_typeThe integer type to use for the indices
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
Returns a dictionary column