Files
file	encode.hpp
	Dictionary column encode and decode APIs.

Functions
std::unique_ptr< column >	cudf::dictionary::encode (column_view const &column, data_type indices_type=data_type{type_id::INT32}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Construct a dictionary column by dictionary encoding an existing column. More...

std::unique_ptr< column >	cudf::dictionary::decode (dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column. More...

Detailed Description

Function Documentation

◆ decode()

std::unique_ptr<column> cudf::dictionary::decode	(	dictionary_column_view const &	dictionary_column,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	mr = `cudf::get_current_device_resource_ref()`
	)

Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column.

d1 = {["a", "c", "d"], [2, 0, 1, 0]}
s = decode(d1)
s is now ["d", "a", "c", "a"]

Parameters

dictionary_column	Existing dictionary column
stream	CUDA stream used for device memory operations and kernel launches
mr	Device memory resource used to allocate the returned column's device memory

Returns: New column with type matching the dictionary_column's keys

◆ encode()

std::unique_ptr<column> cudf::dictionary::encode	(	column_view const &	column,
		data_type	indices_type = `data_type{type_id::INT32}`,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	mr = `cudf::get_current_device_resource_ref()`
	)

Construct a dictionary column by dictionary encoding an existing column.

The output column is a DICTIONARY type with a keys column of non-null, unique values that are in a strict, total order. Meaning, keys[i] is _ordered before keys[i+1] for all i in [0,n-1) where n is the number of keys.

The output column has a child indices column that is of integer type and with the same size as the input column. The indices column will be of type indices_type. The result is undefined if the indices_type is not large enough for the indices values.

The null mask and null count are copied from the input column to the output column.

Exceptions

cudf::data_type_error	if indices type is not a signed integer type
std::invalid_argument	if the column to encode is a DICTIONARY or nested type

c = [429, 111, 213, 111, 213, 429, 213]
d = encode(c)
d now has keys [111, 213, 429] and indices [2, 0, 1, 0, 1, 2, 1]

Parameters

column	The column to dictionary encode
indices_type	The integer type to use for the indices
stream	CUDA stream used for device memory operations and kernel launches
mr	Device memory resource used to allocate the returned column's device memory

Returns: Returns a dictionary column

Files

Functions

Detailed Description

Function Documentation

◆ decode()

◆ encode()