cudf.Series.factorize#

Series.factorize(sort=False, use_na_sentinel=True)#

Encode the input values as integer labels.

Parameters:
sortbool, default True

Sort uniques and shuffle codes to maintain the relationship.

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NA values. If False, NA values will be encoded as non-negative integers and will not drop the NA from the uniques of the values.

Returns:
(labels, cats)(cupy.ndarray, cupy.ndarray or Index)
  • labels contains the encoded values

  • cats contains the categories in order that the N-th item corresponds to the (N-1) code.

Examples

>>> import cudf
>>> s = cudf.Series(['a', 'a', 'c'])
>>> codes, uniques = s.factorize()
>>> codes
array([0, 0, 1], dtype=int8)
>>> uniques
Index(['a', 'c'], dtype='object')