groupby#

class cudf._lib.pylibcudf.groupby.GroupBy(Table keys, null_policy null_handling=null_policy.EXCLUDE, sorted keys_are_sorted=sorted.NO, list column_order=None, list null_precedence=None)#

Group values by keys and compute various aggregate quantities.

For details, see cudf::groupby::groupby.

Parameters:
keysTable

The columns to group by.

null_handlingnull_policy, optional

Whether or not to include null rows in keys. Default is null_policy.EXCLUDE.

keys_are_sortedsorted, optional

Whether the keys are already sorted. Default is sorted.NO.

column_orderlist[order]

Indicates the order of each column. Default is order.ASCENDING. Ignored if keys_are_sorted is sorted.NO.

null_precedencelist[null_order]

Indicates the ordering of null values in each column. Default is null_order.AFTER. Ignored if keys_are_sorted is sorted.NO.

Methods

aggregate(self, list requests)

Compute aggregations on columns.

get_groups(self, Table values=None)

Get the grouped keys and values labels for each row.

replace_nulls(self, Table value, ...)

Replace nulls in columns.

scan(self, list requests)

Compute scans on columns.

shift(self, Table values, list offset, ...)

Compute shifts on columns.

aggregate(self, list requests) tuple#

Compute aggregations on columns.

For details, see cudf::groupby::groupby::aggregate().

Parameters:
requestsList[GroupByRequest]

The list of ~.cudf._lib.pylibcudf.groupby.GroupByRequest , each representing a set of aggregations to perform on a given column of values.

Returns:
Tuple[Table, List[Table, …]]

A tuple whose first element is the unique keys and whose second element is a table of aggregation results. One table is returned for each aggregation request, with the columns corresponding to the sequence of aggregations in the request.

get_groups(self, Table values=None) tuple#

Get the grouped keys and values labels for each row.

For details, see cudf::groupby::groupby::get_groups().

Parameters:
valuesTable, optional

The columns to get group labels for. If not specified, None is returned for the group values.

Returns:
Tuple[List[int], Table, Table]]
A tuple of tables containing three items:
  • A list of integer offsets into the group keys/values

  • A table of group keys

  • A table of group values or None

replace_nulls(self, Table value, list replace_policies) tuple#

Replace nulls in columns.

For details, see cudf::groupby::groupby::replace_nulls().

Parameters:
valuesTable

The columns to replace nulls in.

replace_policiesList[replace_policy]

The policies to use to replace nulls.

Returns:
Tuple[Table, Table]

A tuple whose first element is the group’s keys and whose second element is a table of values with nulls replaced.

scan(self, list requests) tuple#

Compute scans on columns.

For details, see cudf::groupby::groupby::scan().

Parameters:
requestsList[GroupByRequest]

The list of ~.cudf._lib.pylibcudf.groupby.GroupByRequest , each representing a set of aggregations to perform on a given column of values.

Returns:
Tuple[Table, List[Table, …]]

A tuple whose first element is the unique keys and whose second element is a table of aggregation results. One table is returned for each aggregation request, with the columns corresponding to the sequence of aggregations in the request.

shift(self, Table values, list offset, list fill_values) tuple#

Compute shifts on columns.

For details, see cudf::groupby::groupby::shift().

Parameters:
valuesTable

The columns to shift.

offsetList[int]

The offsets to shift by.

fill_valuesList[Scalar]

The values to use to fill in missing values.

Returns:
Tuple[Table, Table]

A tuple whose first element is the group’s keys and whose second element is a table of shifted values.

class cudf._lib.pylibcudf.groupby.GroupByRequest(Column values, list aggregations)#

A request for a groupby aggregation or scan.

This class is functionally polymorphic and can represent either an aggregation or a scan depending on the algorithm it is used with. For details on the libcudf types it converts to, see cudf::groupby::aggregation_request and cudf::groupby::scan_request.

Parameters:
valuesColumn

The column to aggregate.

aggregationsList[Aggregation]

The list of aggregations to perform.