Column Hash#

group column_hash

Typedefs

using hash_value_type = uint32_t#

Type of hash value.

Functions

std::unique_ptr<column> murmurhash3_x86_32(table_view const &input, uint32_t seed = DEFAULT_HASH_SEED, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the MurmurHash3 32-bit hash value of each row in the given table.

This function computes the hash of each column using the seed for the first column and the resulting hash as a seed for the next column and so on. The result is a uint32 value for each row.

Parameters:
  • input – The table of columns to hash

  • seed – Optional seed value to use for the hash function

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<table> murmurhash3_x64_128(table_view const &input, uint64_t seed = DEFAULT_HASH_SEED, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the MurmurHash3 64-bit hash value of each row in the given table.

This function takes a 64-bit seed value and returns hash values using the MurmurHash3_x64_128 algorithm. The hash produces in two uint64 values per row.

Parameters:
  • input – The table of columns to hash

  • seed – Optional seed value to use for the hash function

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A table of two UINT64 columns

std::unique_ptr<column> md5(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the MD5 hash value of each row in the given table.

Parameters:
  • input – The table of columns to hash

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<column> sha1(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the SHA-1 hash value of each row in the given table.

Parameters:
  • input – The table of columns to hash

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<column> sha224(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the SHA-224 hash value of each row in the given table.

Parameters:
  • input – The table of columns to hash

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<column> sha256(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the SHA-256 hash value of each row in the given table.

Parameters:
  • input – The table of columns to hash

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<column> sha384(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the SHA-384 hash value of each row in the given table.

Parameters:
  • input – The table of columns to hash

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<column> sha512(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the SHA-512 hash value of each row in the given table.

Parameters:
  • input – The table of columns to hash

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

std::unique_ptr<column> xxhash_64(table_view const &input, uint64_t seed = DEFAULT_HASH_SEED, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Computes the XXHash_64 hash value of each row in the given table.

This function takes a 64-bit seed value and returns a column of type UINT64.

Parameters:
  • input – The table of columns to hash

  • seed – Optional seed value to use for the hash function

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned column’s device memory

Returns:

A column where each row is the hash of a row from the input

Variables

static constexpr uint32_t DEFAULT_HASH_SEED = 0#

The default seed value for hash functions.