cudf.get_dummies#
- cudf.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, cats=None, sparse=False, drop_first=False, dtype='bool')[source]#
Returns a dataframe whose columns are the one hot encodings of all columns in df
- Parameters:
- dataarray-like, Series, or DataFrame
Data of which to get dummy indicators.
- prefixstr, dict, or sequence, optional
Prefix to append. Either a str (to apply a constant prefix), dict mapping column names to prefixes, or sequence of prefixes to apply with the same length as the number of columns. If not supplied, defaults to the empty string
- prefix_sepstr, dict, or sequence, optional, default ‘_’
Separator to use when appending prefixes
- dummy_naboolean, optional
Add a column to indicate Nones, if False Nones are ignored.
- catsdict, optional
Dictionary mapping column names to sequences of values representing that column’s category. If not supplied, it is computed as the unique values of the column.
- sparseboolean, optional
Right now this is NON-FUNCTIONAL argument in rapids.
- drop_firstboolean, optional
Whether to get k-1 dummies out of k categorical levels by removing the first level.
- columnssequence of str, optional
Names of columns to encode. If not provided, will attempt to encode all columns. Note this is different from pandas default behavior, which encodes all columns with dtype object or categorical
- dtypestr, optional
Output dtype, default ‘bool’
Examples
>>> import cudf >>> df = cudf.DataFrame({"a": ["value1", "value2", None], "b": [0, 0, 0]}) >>> cudf.get_dummies(df) b a_value1 a_value2 0 0 True False 1 0 False True 2 0 False False
>>> cudf.get_dummies(df, dummy_na=True) b a_<NA> a_value1 a_value2 0 0 False True False 1 0 False False True 2 0 True False False
>>> import numpy as np >>> df = cudf.DataFrame({"a":cudf.Series([1, 2, np.nan, None], ... nan_as_null=False)}) >>> df a 0 1.0 1 2.0 2 NaN 3 <NA>
>>> cudf.get_dummies(df, dummy_na=True, columns=["a"]) a_<NA> a_1.0 a_2.0 a_nan 0 False True False False 1 False False True False 2 False False False True 3 True False False False
>>> series = cudf.Series([1, 2, None, 2, 4]) >>> series 0 1 1 2 2 <NA> 3 2 4 4 dtype: int64 >>> cudf.get_dummies(series, dummy_na=True) <NA> 1 2 4 0 False True False False 1 False False True False 2 True False False False 3 False False True False 4 False False False True