Files | Functions
Updating Keys

Files

file  update_keys.hpp
 

Functions

std::unique_ptr< columncudf::dictionary::add_keys (dictionary_column_view const &dictionary_column, column_view const &new_keys, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create a new dictionary column by adding the new keys elements to the existing dictionary_column. More...
 
std::unique_ptr< columncudf::dictionary::remove_keys (dictionary_column_view const &dictionary_column, column_view const &keys_to_remove, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create a new dictionary column by removing the specified keys from the existing dictionary_column. More...
 
std::unique_ptr< columncudf::dictionary::remove_unused_keys (dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create a new dictionary column by removing any keys that are not referenced by any of the indices. More...
 
std::unique_ptr< columncudf::dictionary::set_keys (dictionary_column_view const &dictionary_column, column_view const &keys, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create a new dictionary column by applying only the specified keys to the existing dictionary_column. More...
 
std::vector< std::unique_ptr< column > > cudf::dictionary::match_dictionaries (cudf::host_span< dictionary_column_view const > input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Create new dictionaries that have keys merged from the input dictionaries. More...
 

Detailed Description

Function Documentation

◆ add_keys()

std::unique_ptr<column> cudf::dictionary::add_keys ( dictionary_column_view const &  dictionary_column,
column_view const &  new_keys,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create a new dictionary column by adding the new keys elements to the existing dictionary_column.

The indices are updated if any of the new keys are sorted before any of the existing dictionary elements.

d1 = { keys=["a", "c", "d"], indices=[2, 0, 1, 0, 1]}
d2 = add_keys( d1, ["b", "c"] )
d2 is now {keys=["a", "b", "c", "d"], indices=[3, 0, 2, 0, 2]}

The output column will have the same number of rows as the input column. Null entries from the input column are copied to the output column. No new null entries are created by this operation.

Exceptions
cudf::logic_errorif the new_keys type does not match the keys type in the dictionary_column.
cudf::logic_errorif the new_keys contain nulls.
Parameters
dictionary_columnExisting dictionary column.
new_keysNew keys to incorporate into the dictionary_column.
streamCUDA stream used for device memory operations and kernel launches.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New dictionary column.

◆ match_dictionaries()

std::vector<std::unique_ptr<column> > cudf::dictionary::match_dictionaries ( cudf::host_span< dictionary_column_view const >  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create new dictionaries that have keys merged from the input dictionaries.

This will concatenate the keys for each dictionary and then call set_keys on each. The result is a vector of new dictionaries with a common set of keys.

Parameters
inputDictionary columns to match keys.
streamCUDA stream used for device memory operations and kernel launches.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New dictionary columns.

◆ remove_keys()

std::unique_ptr<column> cudf::dictionary::remove_keys ( dictionary_column_view const &  dictionary_column,
column_view const &  keys_to_remove,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create a new dictionary column by removing the specified keys from the existing dictionary_column.

The output column will have the same number of rows as the input column. Null entries from the input column and copied to the output column. The indices are updated to the new positions of the remaining keys. Any indices pointing to removed keys sets that row to null.

d1 = {keys=["a", "c", "d"], indices=[2, 0, 1, 0, 2]}
d2 = remove_keys( d1, ["b", "c"] )
d2 is now {keys=["a", "d"], indices=[1, 0, x, 0, 1], valids=[1, 1, 0, 1, 1]}

Note that "a" has been removed so output row[2] becomes null.

Exceptions
cudf::logic_errorif the keys_to_remove type does not match the keys type in the dictionary_column.
cudf::logic_errorif the keys_to_remove contain nulls.
Parameters
dictionary_columnExisting dictionary column.
keys_to_removeThe keys to remove from the dictionary_column.
streamCUDA stream used for device memory operations and kernel launches.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New dictionary column.

◆ remove_unused_keys()

std::unique_ptr<column> cudf::dictionary::remove_unused_keys ( dictionary_column_view const &  dictionary_column,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create a new dictionary column by removing any keys that are not referenced by any of the indices.

The indices are updated to the new position values of the remaining keys.

d1 = {["a","c","d"],[2,0,2,0]}
d2 = remove_unused_keys(d1)
d2 is now {["a","d"],[1,0,1,0]}
Parameters
dictionary_columnExisting dictionary column.
streamCUDA stream used for device memory operations and kernel launches.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New dictionary column.

◆ set_keys()

std::unique_ptr<column> cudf::dictionary::set_keys ( dictionary_column_view const &  dictionary_column,
column_view const &  keys,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Create a new dictionary column by applying only the specified keys to the existing dictionary_column.

Any new elements found in the keys parameter are added to the output dictionary. Any existing keys not in the keys parameter are removed.

The number of rows in the output column will be the same as the number of rows in the input column. Existing null entries are copied to the output column. The indices are updated to reflect the position values of the new keys. Any indices pointing to removed keys sets those rows to null.

d1 = {keys=["a", "b", "c"], indices=[2, 0, 1, 2, 1]}
d2 = set_keys(existing_dict, ["b","c","d"])
d2 is now {keys=["b", "c", "d"], indices=[1, x, 0, 1, 0], valids=[1, 0, 1, 1, 1]}
Exceptions
cudf::logic_errorif the keys type does not match the keys type in the dictionary_column.
cudf::logic_errorif the keys contain nulls.
Parameters
dictionary_columnExisting dictionary column.
keysNew keys to use for the output column. Must not contain nulls.
streamCUDA stream used for device memory operations and kernel launches.
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New dictionary column.