minhash#

pylibcudf.nvtext.minhash.minhash(Column input, uint32_t seed, Column a, Column b, size_type width) Column#

Returns the minhash values for each string. This function uses MurmurHash3_x86_32 for the hash algorithm.

For details, see minhash().

Parameters:
inputColumn

Strings column to compute minhash

seeduint32_t

Seed used for the hash function

aColumn

1st parameter value used for the minhash algorithm.

bColumn

2nd parameter value used for the minhash algorithm.

widthsize_type

Character width used for apply substrings;

Returns:
Column

List column of minhash values for each string per seed

pylibcudf.nvtext.minhash.minhash64(Column input, uint64_t seed, Column a, Column b, size_type width) Column#

Returns the minhash values for each string. This function uses MurmurHash3_x64_128 for the hash algorithm.

For details, see minhash64().

Parameters:
inputColumn

Strings column to compute minhash

seeduint64_t

Seed used for the hash function

aColumn

1st parameter value used for the minhash algorithm.

bColumn

2nd parameter value used for the minhash algorithm.

widthsize_type

Character width used for apply substrings;

Returns:
Column

List column of minhash values for each string per seed

pylibcudf.nvtext.minhash.minhash64_ngrams(Column input, size_type ngrams, uint64_t seed, Column a, Column b) Column#

Returns the minhash values for each input row of strings. This function uses MurmurHash3_x64_128 for the hash algorithm.

For details, see minhash64_ngrams().

Parameters:
inputColumn

Strings column to compute minhash

ngramssize_type

Number of consecutive strings to hash in each row

seeduint64_t

Seed used for the hash function

aColumn

1st parameter value used for the minhash algorithm.

bColumn

2nd parameter value used for the minhash algorithm.

Returns:
Column

List column of minhash values for each row per value in columns a and b.

pylibcudf.nvtext.minhash.minhash_ngrams(Column input, size_type ngrams, uint32_t seed, Column a, Column b) Column#

Returns the minhash values for each input row of strings. This function uses MurmurHash3_x86_32 for the hash algorithm.

For details, see minhash_ngrams().

Parameters:
inputColumn

List column of strings to compute minhash

ngramssize_type

Number of consecutive strings to hash in each row

seeduint32_t

Seed used for the hash function

aColumn

1st parameter value used for the minhash algorithm.

bColumn

2nd parameter value used for the minhash algorithm.

Returns:
Column

List column of minhash values for each row per value in columns a and b.