minhash#
- pylibcudf.nvtext.minhash.minhash(signatures, args, kwargs, defaults, _fused_sigindex={})#
Returns the minhash values for each string per seed. This function uses MurmurHash3_x86_32 for the hash algorithm.
For details, see
minhash()
.- Parameters:
- inputColumn
Strings column to compute minhash
- seedsColumn or Scalar
Seed value(s) used for the hash algorithm.
- widthsize_type
Character width used for apply substrings; Default is 4 characters.
- Returns:
- Column
List column of minhash values for each string per seed
- pylibcudf.nvtext.minhash.minhash64(signatures, args, kwargs, defaults, _fused_sigindex={})#
Returns the minhash values for each string per seed. This function uses MurmurHash3_x64_128 for the hash algorithm.
For details, see
minhash64()
.- Parameters:
- inputColumn
Strings column to compute minhash
- seedsColumn or Scalar
Seed value(s) used for the hash algorithm.
- widthsize_type
Character width used for apply substrings; Default is 4 characters.
- Returns:
- Column
List column of minhash values for each string per seed
- pylibcudf.nvtext.minhash.word_minhash(Column input, Column seeds) Column #
Returns the minhash values for each row of strings per seed. This function uses MurmurHash3_x86_32 for the hash algorithm.
For details, see
word_minhash()
.- Parameters:
- inputColumn
Lists column of strings to compute minhash
- seedsColumn or Scalar
Seed values used for the hash algorithm.
- Returns:
- Column
List column of minhash values for each string per seed
- pylibcudf.nvtext.minhash.word_minhash64(Column input, Column seeds) Column #
Returns the minhash values for each row of strings per seed. This function uses MurmurHash3_x64_128 for the hash algorithm though only the first 64-bits of the hash are used in computing the output.
For details, see
word_minhash64()
.- Parameters:
- inputColumn
Lists column of strings to compute minhash
- seedsColumn or Scalar
Seed values used for the hash algorithm.
- Returns:
- Column
List column of minhash values for each string per seed