groupby#
- class pylibcudf.groupby.GroupBy(Table keys, null_policy null_handling=null_policy.EXCLUDE, sorted keys_are_sorted=sorted.NO, list column_order=None, list null_precedence=None)#
Group values by keys and compute various aggregate quantities.
For details, see
cudf::groupby::groupby
.- Parameters:
- keysTable
The columns to group by.
- null_handlingnull_policy, optional
Whether or not to include null rows in keys. Default is
null_policy.EXCLUDE
.- keys_are_sortedsorted, optional
Whether the keys are already sorted. Default is
sorted.NO
.- column_orderlist[order]
Indicates the order of each column. Default is
order.ASCENDING
. Ignored if keys_are_sorted issorted.NO
.- null_precedencelist[null_order]
Indicates the ordering of null values in each column. Default is
null_order.AFTER
. Ignored if keys_are_sorted issorted.NO
.
Methods
aggregate
(self, list requests)Compute aggregations on columns.
get_groups
(self, Table values=None)Get the grouped keys and values labels for each row.
replace_nulls
(self, Table value, ...)Replace nulls in columns.
scan
(self, list requests)Compute scans on columns.
shift
(self, Table values, list offset, ...)Compute shifts on columns.
- aggregate(self, list requests) tuple #
Compute aggregations on columns.
For details, see
cudf::groupby::groupby::aggregate()
.- Parameters:
- requestsList[GroupByRequest]
The list of ~.pylibcudf.groupby.GroupByRequest , each representing a set of aggregations to perform on a given column of values.
- Returns:
- Tuple[Table, List[Table, …]]
A tuple whose first element is the unique keys and whose second element is a table of aggregation results. One table is returned for each aggregation request, with the columns corresponding to the sequence of aggregations in the request.
- get_groups(self, Table values=None) tuple #
Get the grouped keys and values labels for each row.
For details, see
cudf::groupby::groupby::get_groups()
.- Parameters:
- valuesTable, optional
The columns to get group labels for. If not specified, None is returned for the group values.
- Returns:
- Tuple[List[int], Table, Table]
- A tuple of tables containing three items:
A list of integer offsets into the group keys/values
A table of group keys
A table of group values or None
- replace_nulls(self, Table value, list replace_policies) tuple #
Replace nulls in columns.
For details, see
cudf::groupby::groupby::replace_nulls()
.- Parameters:
- valuesTable
The columns to replace nulls in.
- replace_policiesList[replace_policy]
The policies to use to replace nulls.
- Returns:
- Tuple[Table, Table]
A tuple whose first element is the group’s keys and whose second element is a table of values with nulls replaced.
- scan(self, list requests) tuple #
Compute scans on columns.
For details, see
cudf::groupby::groupby::scan()
.- Parameters:
- requestsList[GroupByRequest]
The list of ~.pylibcudf.groupby.GroupByRequest , each representing a set of aggregations to perform on a given column of values.
- Returns:
- Tuple[Table, List[Table, …]]
A tuple whose first element is the unique keys and whose second element is a table of aggregation results. One table is returned for each aggregation request, with the columns corresponding to the sequence of aggregations in the request.
- shift(self, Table values, list offset, list fill_values) tuple #
Compute shifts on columns.
For details, see
cudf::groupby::groupby::shift()
.- Parameters:
- valuesTable
The columns to shift.
- offsetList[int]
The offsets to shift by.
- fill_valuesList[Scalar]
The values to use to fill in missing values.
- Returns:
- Tuple[Table, Table]
A tuple whose first element is the group’s keys and whose second element is a table of shifted values.
- class pylibcudf.groupby.GroupByRequest(Column values, list aggregations)#
A request for a groupby aggregation or scan.
This class is functionally polymorphic and can represent either an aggregation or a scan depending on the algorithm it is used with. For details on the libcudf types it converts to, see
cudf::groupby::aggregation_request
andcudf::groupby::scan_request
.- Parameters:
- valuesColumn
The column to aggregate.
- aggregationsList[Aggregation]
The list of aggregations to perform.