Class KeyRemapping

java.lang.Object
ai.rapids.cudf.KeyRemapping
All Implemented Interfaces:
AutoCloseable

public class KeyRemapping extends Object implements AutoCloseable
Remaps keys to unique integer IDs.

Each distinct key in the build table is assigned a unique non-negative integer ID. Rows with equal keys will map to the same ID. Keys that cannot be mapped (e.g., not found in probe, or null keys when nulls are unequal) receive negative sentinel values. The specific ID values are stable for the lifetime of this object but are otherwise unspecified.

Ownership: This class increments the reference counts on the columns from the provided build keys table. The underlying column data is shared, not copied. When this object is closed, it will decrement those reference counts. The original table passed to the constructor is not affected and the caller retains ownership of it.

For advanced memory management (e.g., spilling), use releaseBuildKeys() to take ownership of the internal build keys table. After calling this method, the caller is responsible for ensuring the returned table remains valid for the lifetime of this object and for closing it when appropriate.

Usage pattern:


 try (KeyRemapping remap = new KeyRemapping(buildKeys, true)) {
   // Remap build keys (recomputes from cached build table)
   try (ColumnVector remappedBuild = remap.remapBuildKeys()) {
     // Remap probe keys
     try (ColumnVector remappedProbe = remap.remapProbeKeys(probeKeys)) {
       // Use remapped integer keys
     }
   }
 }
 

  • Field Details

    • NOT_FOUND_SENTINEL

      public static final int NOT_FOUND_SENTINEL
      Sentinel value for probe-side keys not found in build table.

      This constant is primarily exposed for testing purposes. It must be kept in sync with KEY_REMAP_NOT_FOUND in cudf/join/key_remapping.hpp.

      See Also:
    • BUILD_NULL_SENTINEL

      public static final int BUILD_NULL_SENTINEL
      Sentinel value for build-side rows with null keys (when nulls are not equal).

      This constant is primarily exposed for testing purposes. It must be kept in sync with KEY_REMAP_BUILD_NULL in cudf/join/key_remapping.hpp.

      See Also:
  • Constructor Details

    • KeyRemapping

      public KeyRemapping(Table buildKeys, NullEquality nullEquality, boolean computeMetrics)
      Construct a key remapping structure from build keys.

      This constructor increments the reference counts on the columns from the provided table, creating a shared reference to the underlying column data. The original table is not affected and the caller retains ownership of it.

      Parameters:
      buildKeys - table containing the keys to build from. The column reference counts will be incremented; the caller retains ownership of this table.
      nullEquality - how null key values should be compared. When EQUAL, null keys are treated as equal and assigned a valid non-negative ID. When UNEQUAL, rows with null keys receive a negative sentinel value.
      computeMetrics - if true, compute distinctCount and maxDuplicateCount. If false, skip metrics computation for better performance; calling getDistinctCount() or getMaxDuplicateCount() will throw.
    • KeyRemapping

      public KeyRemapping(Table buildKeys, NullEquality nullEquality)
      Construct a key remapping structure from build keys with metrics computation enabled.
      Parameters:
      buildKeys - table containing the keys to build from
      nullEquality - how null key values should be compared
    • KeyRemapping

      public KeyRemapping(Table buildKeys)
      Construct a key remapping structure from build keys with nulls comparing equal and metrics computation enabled.
      Parameters:
      buildKeys - table containing the keys to build from
  • Method Details

    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • getNullEquality

      public NullEquality getNullEquality()
      Returns the null equality setting used when building the hash table.
      Returns:
      the NullEquality setting
    • hasMetrics

      public boolean hasMetrics()
      Check if metrics (distinctCount, maxDuplicateCount) were computed.
      Returns:
      true if metrics are available, false if computeMetrics was false during construction
    • getDistinctCount

      public int getDistinctCount()
      Get the number of distinct keys in the build table.
      Returns:
      The count of unique key combinations found during build
      Throws:
      IllegalStateException - if computeMetrics was false during construction
    • getMaxDuplicateCount

      public int getMaxDuplicateCount()
      Get the maximum number of times any single key appears in the build table.
      Returns:
      The maximum duplicate count across all distinct keys
      Throws:
      IllegalStateException - if computeMetrics was false during construction
    • releaseBuildKeys

      public Table releaseBuildKeys()
      Release ownership of the internal build keys table to the caller.

      Advanced API for memory management (e.g., spilling).

      After calling this method:

      • The caller owns the returned Table and is responsible for closing it
      • The caller must ensure the returned Table remains valid (not closed, not spilled) for as long as this KeyRemapping object is in use
      • When this KeyRemapping is closed, it will NOT close the build keys table
      • This method can only be called once; subsequent calls will throw an exception

      This is useful for scenarios where the caller wants to manage the build keys table separately, such as spilling it to disk and restoring it later, while keeping the native hash table alive.

      Returns:
      The build keys Table. The caller takes ownership and must close it when done.
      Throws:
      IllegalStateException - if already closed or if build keys were already released
    • isBuildKeysReleased

      public boolean isBuildKeysReleased()
      Check if the build keys have been released via releaseBuildKeys().
      Returns:
      true if build keys have been released, false otherwise
    • remapBuildKeys

      public ColumnVector remapBuildKeys()
      Remap build keys to integer IDs.

      Recomputes the remapped build table from the cached build keys. This does not cache the remapped table; each call will recompute it from the key remapping.

      For each row in the cached build table, returns the integer ID assigned to that key. Non-negative integers represent valid mapped keys, while negative values represent keys that cannot be mapped (e.g., null keys when nulls are unequal).

      Returns:
      A column of INT32 values with the remapped key IDs (caller must close)
    • remapProbeKeys

      public ColumnVector remapProbeKeys(Table keys)
      Remap probe keys to integer IDs.

      For each row in the input, returns the integer ID assigned to that key. The keys table must have the same schema (number and types of columns) as the build table used to construct this object.

      Non-negative integers represent keys found in the build table, while negative values represent keys that were not found or cannot be matched (e.g., null keys when nulls are unequal, or keys not present in the build table).

      Parameters:
      keys - The probe keys to remap (must have same schema as build table)
      Returns:
      A column of INT32 values with the remapped key IDs (caller must close)
      Throws:
      IllegalArgumentException - if keys has different number of columns than build table
      CudfException - if keys has different column types than build table