Groups values by keys and computes aggregations on those groups. More...
#include <groupby.hpp>
Classes | |
struct | groups |
The grouped data corresponding to a groupby operation on a set of values. More... | |
Public Member Functions | |
groupby (groupby const &)=delete | |
groupby (groupby &&)=delete | |
groupby & | operator= (groupby const &)=delete |
groupby & | operator= (groupby &&)=delete |
groupby (table_view const &keys, null_policy null_handling=null_policy::EXCLUDE, sorted keys_are_sorted=sorted::NO, std::vector< order > const &column_order={}, std::vector< null_order > const &null_precedence={}) | |
Construct a groupby object with the specified keys More... | |
std::pair< std::unique_ptr< table >, std::vector< aggregation_result > > | aggregate (host_span< aggregation_request const > requests, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Performs grouped aggregations on the specified values. More... | |
std::pair< std::unique_ptr< table >, std::vector< aggregation_result > > | aggregate (host_span< aggregation_request const > requests, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Performs grouped aggregations on the specified values. More... | |
std::pair< std::unique_ptr< table >, std::vector< aggregation_result > > | scan (host_span< scan_request const > requests, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Performs grouped scans on the specified values. More... | |
std::pair< std::unique_ptr< table >, std::unique_ptr< table > > | shift (table_view const &values, host_span< size_type const > offsets, std::vector< std::reference_wrapper< scalar const >> const &fill_values, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Performs grouped shifts for specified values. More... | |
groups | get_groups (cudf::table_view values={}, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Get the grouped keys and values corresponding to a groupby operation on a set of values. More... | |
std::pair< std::unique_ptr< table >, std::unique_ptr< table > > | replace_nulls (table_view const &values, host_span< cudf::replace_policy const > replace_policies, rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource()) |
Performs grouped replace nulls on value . More... | |
Groups values by keys and computes aggregations on those groups.
Definition at line 96 of file groupby.hpp.
|
explicit |
Construct a groupby object with the specified keys
If the keys
are already sorted, better performance may be achieved by passing keys_are_sorted == true
and indicating the ascending/descending order of each column and null order in column_order
and null_precedence
, respectively.
keys
. It is the user's responsibility to ensure the groupby
object does not outlive the data viewed by the keys
table_view
.keys | Table whose rows act as the groupby keys |
null_handling | Indicates whether rows in keys that contain NULL values should be included |
keys_are_sorted | Indicates whether rows in keys are already sorted |
column_order | If keys_are_sorted == YES , indicates whether each column is ascending/descending. If empty, assumes all columns are ascending. Ignored if keys_are_sorted == false . |
null_precedence | If keys_are_sorted == YES , indicates the ordering of null values in each column. Else, ignored. If empty, assumes all columns use null_order::AFTER . Ignored if keys_are_sorted == false . |
std::pair<std::unique_ptr<table>, std::vector<aggregation_result> > cudf::groupby::groupby::aggregate | ( | host_span< aggregation_request const > | requests, |
rmm::cuda_stream_view | stream, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Performs grouped aggregations on the specified values.
The values to aggregate and the aggregations to perform are specified in an aggregation_request
. Each request contains a column_view
of values to aggregate and a set of aggregation
s to perform on those elements.
For each aggregation
in a request, values[i]
is aggregated with all other values[j]
where rows i
and j
in keys
are equivalent.
The size()
of the request column must equal keys.num_rows()
.
For every aggregation_request
an aggregation_result
will be returned. The aggregation_result
holds the resulting column(s) for each requested aggregation on the request
s values. The order of the columns in each result is the same order as was specified in the request.
The returned table
contains the group labels for each group, i.e., the unique rows from keys
. Element i
across all aggregation results belongs to the group at row i
in the group labels table.
The order of the rows in the group labels is arbitrary. Furthermore, successive groupby::aggregate
calls may return results in different orders.
cudf::logic_error | If requests[i].values.size() != keys.num_rows() . |
Example:
requests | The set of columns to aggregate and the aggregations to perform |
mr | Device memory resource used to allocate the returned table and columns' device memory |
requests
.stream | CUDA stream used for device memory operations and kernel launches. |
std::pair<std::unique_ptr<table>, std::vector<aggregation_result> > cudf::groupby::groupby::aggregate | ( | host_span< aggregation_request const > | requests, |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Performs grouped aggregations on the specified values.
The values to aggregate and the aggregations to perform are specified in an aggregation_request
. Each request contains a column_view
of values to aggregate and a set of aggregation
s to perform on those elements.
For each aggregation
in a request, values[i]
is aggregated with all other values[j]
where rows i
and j
in keys
are equivalent.
The size()
of the request column must equal keys.num_rows()
.
For every aggregation_request
an aggregation_result
will be returned. The aggregation_result
holds the resulting column(s) for each requested aggregation on the request
s values. The order of the columns in each result is the same order as was specified in the request.
The returned table
contains the group labels for each group, i.e., the unique rows from keys
. Element i
across all aggregation results belongs to the group at row i
in the group labels table.
The order of the rows in the group labels is arbitrary. Furthermore, successive groupby::aggregate
calls may return results in different orders.
cudf::logic_error | If requests[i].values.size() != keys.num_rows() . |
Example:
requests | The set of columns to aggregate and the aggregations to perform |
mr | Device memory resource used to allocate the returned table and columns' device memory |
requests
. groups cudf::groupby::groupby::get_groups | ( | cudf::table_view | values = {} , |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Get the grouped keys and values corresponding to a groupby operation on a set of values.
Returns a groups
object representing the grouped keys and values. If values is not provided, only a grouping of the keys is performed, and the values
of the groups
object will be nullptr
.
values | Table representing values on which a groupby operation is to be performed |
mr | Device memory resource used to allocate the returned tables's device memory in the returned groups |
groups
object representing grouped keys and values std::pair<std::unique_ptr<table>, std::unique_ptr<table> > cudf::groupby::groupby::replace_nulls | ( | table_view const & | values, |
host_span< cudf::replace_policy const > | replace_policies, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Performs grouped replace nulls on value
.
For each value[i] == NULL
in group j
, value[i]
is replaced with the first non-null value in group j
that precedes or follows value[i]
. If a non-null value is not found in the specified direction, value[i]
is left NULL.
The returned pair contains a column of the sorted keys and the result column. In result column, values of the same group are in contiguous memory. In each group, the order of values maintain their original order. The order of groups are not guaranteed.
Example:
[in] | values | A table whose column null values will be replaced |
[in] | replace_policies | Specify the position of replacement values relative to null values, one for each column |
[in] | mr | Device memory resource used to allocate device memory of the returned column |
std::pair<std::unique_ptr<table>, std::vector<aggregation_result> > cudf::groupby::groupby::scan | ( | host_span< scan_request const > | requests, |
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Performs grouped scans on the specified values.
The values to aggregate and the aggregations to perform are specified in an aggregation_request
. Each request contains a column_view
of values to aggregate and a set of aggregation
s to perform on those elements.
For each aggregation
in a request, values[i]
is scan aggregated with all previous values[j]
where rows i
and j
in keys
are equivalent.
The size()
of the request column must equal keys.num_rows()
.
For every aggregation_request
an aggregation_result
will be returned. The aggregation_result
holds the resulting column(s) for each requested aggregation on the request
s values. The order of the columns in each result is the same order as was specified in the request.
The returned table
contains the group labels for each row, i.e., the keys
given to groupby object. Element i
across all aggregation results belongs to the group at row i
in the group labels table.
The order of the rows in the group labels is arbitrary. Furthermore, successive groupby::scan
calls may return results in different orders.
cudf::logic_error | If requests[i].values.size() != keys.num_rows() . |
Example:
requests | The set of columns to scan and the scans to perform |
mr | Device memory resource used to allocate the returned table and columns' device memory |
requests
. std::pair<std::unique_ptr<table>, std::unique_ptr<table> > cudf::groupby::groupby::shift | ( | table_view const & | values, |
host_span< size_type const > | offsets, | ||
std::vector< std::reference_wrapper< scalar const >> const & | fill_values, | ||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) |
Performs grouped shifts for specified values.
In j
th column, for each group, i
th element is determined by the i - offsets[j]
th element of the group. If i - offsets[j] < 0 or >= group_size
, the value is determined by fill_values
[j].
i
of the key table corresponds to the group labels of row i
in the shifted columns. The key order in each group matches the input order. The order of each group is arbitrary. The group order in successive calls to groupby::shifts
may be different.Example:
values | Table whose columns to be shifted |
offsets | The offsets by which to shift the input |
fill_values | Fill values for indeterminable outputs |
mr | Device memory resource used to allocate the returned table and columns' device memory |
cudf::logic_error | if fill_value [i] dtype does not match values [i] dtype for i th column |