minhash#

pylibcudf.nvtext.minhash.minhash(Column input, uint32_t seed, Column a, Column b, size_type width) Column#

Returns the minhash values for each string. This function uses MurmurHash3_x86_32 for the hash algorithm.

For details, see minhash().

Parameters:
inputColumn

Strings column to compute minhash

seeduint32_t

Seed used for the hash function

aColumn

1st parameter value used for the minhash algorithm.

bColumn

2nd parameter value used for the minhash algorithm.

widthsize_type

Character width used for apply substrings;

Returns:
Column

List column of minhash values for each string per seed

pylibcudf.nvtext.minhash.minhash64(Column input, uint64_t seed, Column a, Column b, size_type width) Column#

Returns the minhash values for each string. This function uses MurmurHash3_x64_128 for the hash algorithm.

For details, see minhash64().

Parameters:
inputColumn

Strings column to compute minhash

seeduint64_t

Seed used for the hash function

aColumn

1st parameter value used for the minhash algorithm.

bColumn

2nd parameter value used for the minhash algorithm.

widthsize_type

Character width used for apply substrings;

Returns:
Column

List column of minhash values for each string per seed