cudf.Series.factorize#
- Series.factorize(sort=False, na_sentinel=None, use_na_sentinel=None)#
Encode the input values as integer labels.
- Parameters:
- sortbool, default True
Sort uniques and shuffle codes to maintain the relationship.
- na_sentinelnumber, default -1
Value to indicate missing category.
Deprecated since version 23.04: The na_sentinel argument is deprecated and will be removed in a future version of cudf. Specify use_na_sentinel as either True or False.
- use_na_sentinelbool, default True
If True, the sentinel -1 will be used for NA values. If False, NA values will be encoded as non-negative integers and will not drop the NA from the uniques of the values.
- Returns:
- (labels, cats)(cupy.ndarray, cupy.ndarray or Index)
labels contains the encoded values
cats contains the categories in order that the N-th item corresponds to the (N-1) code.
Examples
>>> import cudf >>> s = cudf.Series(['a', 'a', 'c']) >>> codes, uniques = s.factorize() >>> codes array([0, 0, 1], dtype=int8) >>> uniques StringIndex(['a' 'c'], dtype='object')