minhash#

pylibcudf.nvtext.minhash.minhash(signatures, args, kwargs, defaults, _fused_sigindex={})#

Returns the minhash values for each string per seed. This function uses MurmurHash3_x86_32 for the hash algorithm.

For details, see minhash().

Parameters:
inputColumn

Strings column to compute minhash

seedsColumn or Scalar

Seed value(s) used for the hash algorithm.

widthsize_type

Character width used for apply substrings; Default is 4 characters.

Returns:
Column

List column of minhash values for each string per seed

pylibcudf.nvtext.minhash.minhash64(signatures, args, kwargs, defaults, _fused_sigindex={})#

Returns the minhash values for each string per seed. This function uses MurmurHash3_x64_128 for the hash algorithm.

For details, see minhash64().

Parameters:
inputColumn

Strings column to compute minhash

seedsColumn or Scalar

Seed value(s) used for the hash algorithm.

widthsize_type

Character width used for apply substrings; Default is 4 characters.

Returns:
Column

List column of minhash values for each string per seed

pylibcudf.nvtext.minhash.word_minhash(Column input, Column seeds) Column#

Returns the minhash values for each row of strings per seed. This function uses MurmurHash3_x86_32 for the hash algorithm.

For details, see word_minhash().

Parameters:
inputColumn

Lists column of strings to compute minhash

seedsColumn or Scalar

Seed values used for the hash algorithm.

Returns:
Column

List column of minhash values for each string per seed

pylibcudf.nvtext.minhash.word_minhash64(Column input, Column seeds) Column#

Returns the minhash values for each row of strings per seed. This function uses MurmurHash3_x64_128 for the hash algorithm though only the first 64-bits of the hash are used in computing the output.

For details, see word_minhash64().

Parameters:
inputColumn

Lists column of strings to compute minhash

seedsColumn or Scalar

Seed values used for the hash algorithm.

Returns:
Column

List column of minhash values for each string per seed