Public Member Functions | List of all members
cudf::key_remapping Class Reference

Remaps keys to unique integer IDs. More...

#include <key_remapping.hpp>

Public Member Functions

 key_remapping (key_remapping const &)=delete
 
 key_remapping (key_remapping &&)=delete
 
key_remappingoperator= (key_remapping const &)=delete
 
key_remappingoperator= (key_remapping &&)=delete
 
 key_remapping (cudf::table_view const &build, null_equality compare_nulls=null_equality::EQUAL, cudf::compute_metrics metrics=cudf::compute_metrics::YES, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructs a key remapping structure from the given build keys. More...
 
std::unique_ptr< cudf::columnremap_build_keys (rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Remap build keys to integer IDs. More...
 
std::unique_ptr< cudf::columnremap_probe_keys (cudf::table_view const &keys, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Remap probe keys to integer IDs. More...
 
bool has_metrics () const
 Check if metrics (distinct_count, max_duplicate_count) were computed. More...
 
size_type get_distinct_count () const
 Get the number of distinct keys in the build table. More...
 
size_type get_max_duplicate_count () const
 Get the maximum number of times any single key appears. More...
 

Detailed Description

Remaps keys to unique integer IDs.

Each distinct key in the build table is assigned a unique non-negative integer ID. Rows with equal keys will map to the same ID. Keys that cannot be mapped (e.g., not found in probe, or null keys when nulls are unequal) receive negative sentinel values. The specific ID values are stable for the lifetime of this object but are otherwise unspecified.

Note
The build table must remain valid for the lifetime of this object, as the hash table references it directly without copying.
All NaNs are considered equal

Definition at line 69 of file key_remapping.hpp.

Constructor & Destructor Documentation

◆ key_remapping()

cudf::key_remapping::key_remapping ( cudf::table_view const &  build,
null_equality  compare_nulls = null_equality::EQUAL,
cudf::compute_metrics  metrics = cudf::compute_metrics::YES,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructs a key remapping structure from the given build keys.

Exceptions
cudf::logic_errorif the build table has no columns
Parameters
buildThe build table containing the keys to remap
compare_nullsControls whether null key values should match or not. When EQUAL, null keys are treated as equal and assigned a valid non-negative ID. When UNEQUAL, rows with null keys receive a negative sentinel value.
metricsControls whether to compute distinct_count and max_duplicate_count. If YES (default), compute metrics for later retrieval via get_distinct_count() and get_max_duplicate_count(). If NO, skip metrics computation for better performance; calling get_distinct_count() or get_max_duplicate_count() will throw.
streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ get_distinct_count()

size_type cudf::key_remapping::get_distinct_count ( ) const

Get the number of distinct keys in the build table.

Exceptions
cudf::logic_errorif metrics was NO during construction
Returns
The count of unique key combinations found during build

◆ get_max_duplicate_count()

size_type cudf::key_remapping::get_max_duplicate_count ( ) const

Get the maximum number of times any single key appears.

Exceptions
cudf::logic_errorif metrics was NO during construction
Returns
The maximum duplicate count across all distinct keys

◆ has_metrics()

bool cudf::key_remapping::has_metrics ( ) const

Check if metrics (distinct_count, max_duplicate_count) were computed.

Returns
true if metrics are available, false if metrics was NO during construction

◆ remap_build_keys()

std::unique_ptr<cudf::column> cudf::key_remapping::remap_build_keys ( rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Remap build keys to integer IDs.

Recomputes the remapped build table from the cached build keys. This does not cache the remapped table; each call will recompute it from the key remapping.

For each row in the cached build table, returns the integer ID assigned to that key. Non-negative integers represent valid mapped keys, while negative values represent keys that cannot be mapped (e.g., null keys when nulls are unequal).

Parameters
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column of INT32 values with the remapped key IDs

◆ remap_probe_keys()

std::unique_ptr<cudf::column> cudf::key_remapping::remap_probe_keys ( cudf::table_view const &  keys,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Remap probe keys to integer IDs.

For each row in the input, returns the integer ID assigned to that key. Non-negative integers represent keys found in the build table, while negative values represent keys that were not found or cannot be matched (e.g., null keys when nulls are unequal, or keys not present in the build table).

Exceptions
std::invalid_argumentif keys has different number of columns than build table
cudf::data_type_errorif keys has different column types than build table
Parameters
keysThe probe keys to remap (must have same schema as build table)
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column of INT32 values with the remapped key IDs

The documentation for this class was generated from the following file: