cudf.crosstab#

cudf.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=None, normalize=False)[source]#

Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Parameters:
indexarray-like, Series, or list of arrays/Series

Values to group by in the rows.

columnsarray-like, Series, or list of arrays/Series

Values to group by in the columns.

valuesarray-like, optional

Array of values to aggregate according to the factors. Requires aggfunc be specified.

rownameslist of str, default None

If passed, must match number of row arrays passed.

colnameslist of str, default None

If passed, must match number of column arrays passed.

aggfuncfunction, optional

If specified, requires values be specified as well.

marginsNot supported
margins_nameNot supported
dropnaNot supported
normalizeNot supported
Returns:
DataFrame

Cross tabulation of the data.

Examples

>>> a = cudf.Series(["foo", "foo", "foo", "foo", "bar", "bar",
...               "bar", "bar", "foo", "foo", "foo"], dtype=object)
>>> b = cudf.Series(["one", "one", "one", "two", "one", "one",
...               "one", "two", "two", "two", "one"], dtype=object)
>>> c = cudf.Series(["dull", "dull", "shiny", "dull", "dull", "shiny",
...               "shiny", "dull", "shiny", "shiny", "shiny"],
...              dtype=object)
>>> cudf.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
b   one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2