Dictionary Encode#
- group dictionary_encode
Functions
-
std::unique_ptr<column> encode(column_view const &column, data_type indices_type = data_type{type_id::UINT32}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Construct a dictionary column by dictionary encoding an existing column.
The output column is a DICTIONARY type with a keys column of non-null, unique values that are in a strict, total order. Meaning,
keys[i]
is _ordered beforekeys[i+1]
for alli in [0,n-1)
wheren
is the number of keys.The output column has a child indices column that is of integer type and with the same size as the input column.
The null mask and null count are copied from the input column to the output column.
c = [429, 111, 213, 111, 213, 429, 213] d = encode(c) d now has keys [111, 213, 429] and indices [2, 0, 1, 0, 1, 2, 1]
- Throws:
cudf::logic_error – if indices type is not an unsigned integer type
cudf::logic_error – if the column to encode is already a DICTIONARY type
- Parameters:
column – The column to dictionary encode
indices_type – The integer type to use for the indices
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
Returns a dictionary column
-
std::unique_ptr<column> decode(dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Create a column by gathering the keys from the provided dictionary_column into a new column using the indices from that column.
d1 = {["a", "c", "d"], [2, 0, 1, 0]} s = decode(d1) s is now ["d", "a", "c", "a"]
- Parameters:
dictionary_column – Existing dictionary column
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
New column with type matching the dictionary_column’s keys
-
std::unique_ptr<column> encode(column_view const &column, data_type indices_type = data_type{type_id::UINT32}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#