cudf.DataFrame.hash_values#

DataFrame.hash_values(method: Literal['murmur3', 'xxhash64', 'md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512'] = 'murmur3', seed: int | None = None) Series[source]#

Compute the hash of values in this column.

Parameters:
method{‘murmur3’, ‘md5’, ‘xxhash64’}, default ‘murmur3’

Hash function to use:

  • murmur3: MurmurHash3 hash function

  • md5: MD5 hash function

  • xxhash64: xxHash64 hash function

seedint, optional

Seed value to use for the hash function. This parameter is only supported for ‘murmur3’ and ‘xxhash64’.

Returns:
Series

A Series with hash values.

Examples

Series

>>> import cudf
>>> series = cudf.Series([10, 120, 30])
>>> series
0     10
1    120
2     30
dtype: int64
>>> series.hash_values(method="murmur3")
0   -1930516747
1     422619251
2    -941520876
dtype: int32
>>> series.hash_values(method="md5")
0    7be4bbacbfdb05fb3044e36c22b41e8b
1    947ca8d2c5f0f27437f156cfbfab0969
2    d0580ef52d27c043c8e341fd5039b166
dtype: object
>>> series.hash_values(method="murmur3", seed=42)
0    2364453205
1     422621911
2    3353449140
dtype: uint32

DataFrame

>>> import cudf
>>> df = cudf.DataFrame({"a": [10, 120, 30], "b": [0.0, 0.25, 0.50]})
>>> df
     a     b
0   10  0.00
1  120  0.25
2   30  0.50
>>> df.hash_values(method="murmur3")
0    -330519225
1    -397962448
2   -1345834934
dtype: int32
>>> df.hash_values(method="md5")
0    57ce879751b5169c525907d5c563fae1
1    948d6221a7c4963d4be411bcead7e32b
2    fe061786ea286a515b772d91b0dfcd70
dtype: object