Remaps keys to unique integer IDs. More...
#include <key_remapping.hpp>
Public Member Functions | |
| key_remapping (key_remapping const &)=delete | |
| key_remapping (key_remapping &&)=delete | |
| key_remapping & | operator= (key_remapping const &)=delete |
| key_remapping & | operator= (key_remapping &&)=delete |
| key_remapping (cudf::table_view const &build, null_equality compare_nulls=null_equality::EQUAL, cudf::compute_metrics metrics=cudf::compute_metrics::YES, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
| Constructs a key remapping structure from the given build keys. More... | |
| std::unique_ptr< cudf::column > | remap_build_keys (rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
| Remap build keys to integer IDs. More... | |
| std::unique_ptr< cudf::column > | remap_probe_keys (cudf::table_view const &keys, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
| Remap probe keys to integer IDs. More... | |
| bool | has_metrics () const |
| Check if metrics (distinct_count, max_duplicate_count) were computed. More... | |
| size_type | get_distinct_count () const |
| Get the number of distinct keys in the build table. More... | |
| size_type | get_max_duplicate_count () const |
| Get the maximum number of times any single key appears. More... | |
Remaps keys to unique integer IDs.
Each distinct key in the build table is assigned a unique non-negative integer ID. Rows with equal keys will map to the same ID. Keys that cannot be mapped (e.g., not found in probe, or null keys when nulls are unequal) receive negative sentinel values. The specific ID values are stable for the lifetime of this object but are otherwise unspecified.
Definition at line 69 of file key_remapping.hpp.
| cudf::key_remapping::key_remapping | ( | cudf::table_view const & | build, |
| null_equality | compare_nulls = null_equality::EQUAL, |
||
| cudf::compute_metrics | metrics = cudf::compute_metrics::YES, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Constructs a key remapping structure from the given build keys.
| cudf::logic_error | if the build table has no columns |
| build | The build table containing the keys to remap |
| compare_nulls | Controls whether null key values should match or not. When EQUAL, null keys are treated as equal and assigned a valid non-negative ID. When UNEQUAL, rows with null keys receive a negative sentinel value. |
| metrics | Controls whether to compute distinct_count and max_duplicate_count. If YES (default), compute metrics for later retrieval via get_distinct_count() and get_max_duplicate_count(). If NO, skip metrics computation for better performance; calling get_distinct_count() or get_max_duplicate_count() will throw. |
| stream | CUDA stream used for device memory operations and kernel launches |
| size_type cudf::key_remapping::get_distinct_count | ( | ) | const |
Get the number of distinct keys in the build table.
| cudf::logic_error | if metrics was NO during construction |
| size_type cudf::key_remapping::get_max_duplicate_count | ( | ) | const |
Get the maximum number of times any single key appears.
| cudf::logic_error | if metrics was NO during construction |
| bool cudf::key_remapping::has_metrics | ( | ) | const |
Check if metrics (distinct_count, max_duplicate_count) were computed.
| std::unique_ptr<cudf::column> cudf::key_remapping::remap_build_keys | ( | rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Remap build keys to integer IDs.
Recomputes the remapped build table from the cached build keys. This does not cache the remapped table; each call will recompute it from the key remapping.
For each row in the cached build table, returns the integer ID assigned to that key. Non-negative integers represent valid mapped keys, while negative values represent keys that cannot be mapped (e.g., null keys when nulls are unequal).
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |
| std::unique_ptr<cudf::column> cudf::key_remapping::remap_probe_keys | ( | cudf::table_view const & | keys, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Remap probe keys to integer IDs.
For each row in the input, returns the integer ID assigned to that key. Non-negative integers represent keys found in the build table, while negative values represent keys that were not found or cannot be matched (e.g., null keys when nulls are unequal, or keys not present in the build table).
| std::invalid_argument | if keys has different number of columns than build table |
| cudf::data_type_error | if keys has different column types than build table |
| keys | The probe keys to remap (must have same schema as build table) |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |