All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
Public Member Functions | Protected Member Functions | Friends | List of all members
cudf::groupby_host_udf Struct Referenceabstract

The interface for host-based UDF implementation for groupby aggregation context. More...

#include <host_udf.hpp>

Inheritance diagram for cudf::groupby_host_udf:
cudf::host_udf_base

Public Member Functions

virtual std::unique_ptr< columnget_empty_output (rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
 Get the output when the input values column is empty. More...
 
virtual std::unique_ptr< columnoperator() (rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
 Perform the main groupby computation for the host-based UDF. More...
 
- Public Member Functions inherited from cudf::host_udf_base
virtual ~host_udf_base ()=default
 Default destructor.
 
virtual std::size_t do_hash () const
 Computes hash value of the instance. More...
 
virtual bool is_equal (host_udf_base const &other) const =0
 Compares two instances of the derived class for equality. More...
 
virtual std::unique_ptr< host_udf_baseclone () const =0
 Clones the instance. More...
 

Protected Member Functions

column_view get_input_values () const
 Access the input values column. More...
 
column_view get_grouped_values () const
 Access the input values grouped according to the input keys for which the values within each group maintain their original order. More...
 
column_view get_sorted_grouped_values () const
 Access the input values grouped according to the input keys and sorted within each group. More...
 
size_type get_num_groups () const
 Access the number of groups (i.e., number of distinct keys). More...
 
device_span< size_type const > get_group_offsets () const
 Access the offsets separating groups. More...
 
device_span< size_type const > get_group_labels () const
 Access the group labels (which is also the same as group indices). More...
 
column_view compute_aggregation (std::unique_ptr< aggregation > other_agg) const
 Compute a built-in groupby aggregation and access its result. More...
 

Friends

struct groupby::detail::aggregate_result_functor
 

Detailed Description

The interface for host-based UDF implementation for groupby aggregation context.

An implementation of host-based UDF for groupby needs to be derived from this class. In addition to implementing the virtual functions declared in the base class host_udf_base, such a derived class must also define the functions get_empty_output() to return result when the input is empty, and operator() to perform its groupby operations.

During execution, the derived class can access internal data provided by the libcudf groupby framework through a set of get* accessors, as well as calling other built-in groupby aggregations through the compute_aggregation function.

Note
The derived class can only perform sort-based groupby aggregations. Hash-based groupby aggregations require more complex data structure and is not yet supported.

Example:

struct my_udf_aggregation : cudf::groupby_host_udf {
my_udf_aggregation() = default;
[[nodiscard]] std::unique_ptr<column> get_empty_output(
rmm::device_async_resource_ref mr) const override
{
// Return a column corresponding to the result when the input values column is empty.
}
[[nodiscard]] std::unique_ptr<column> operator()(
rmm::device_async_resource_ref mr) const override
{
// Perform UDF computation using the input data and return the result.
}
[[nodiscard]] bool is_equal(host_udf_base const& other) const override
{
// Check if the other object is also instance of this class.
// If there are internal state variables, they may need to be checked for equality as well.
return dynamic_cast<my_udf_aggregation const*>(&other) != nullptr;
}
[[nodiscard]] std::unique_ptr<host_udf_base> clone() const override
{
return std::make_unique<my_udf_aggregation>();
}
};
virtual bool is_equal(host_udf_base const &other) const =0
Compares two instances of the derived class for equality.
virtual std::unique_ptr< host_udf_base > clone() const =0
Clones the instance.
cuda::mr::async_resource_ref< cuda::mr::device_accessible > device_async_resource_ref
The interface for host-based UDF implementation for groupby aggregation context.
Definition: host_udf.hpp:267
virtual std::unique_ptr< column > get_empty_output(rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
Get the output when the input values column is empty.
virtual std::unique_ptr< column > operator()(rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
Perform the main groupby computation for the host-based UDF.

Definition at line 267 of file host_udf.hpp.

Member Function Documentation

◆ compute_aggregation()

column_view cudf::groupby_host_udf::compute_aggregation ( std::unique_ptr< aggregation other_agg) const
inlineprotected

Compute a built-in groupby aggregation and access its result.

This allows the derived class to call any other built-in groupby aggregations on the same input values column and access the output for its operations.

Parameters
other_aggAn arbitrary built-in groupby aggregation
Returns
A column_view object corresponding to the output result of the given aggregation

Definition at line 410 of file host_udf.hpp.

◆ get_empty_output()

virtual std::unique_ptr<column> cudf::groupby_host_udf::get_empty_output ( rmm::cuda_stream_view  stream,
rmm::device_async_resource_ref  mr 
) const
pure virtual

Get the output when the input values column is empty.

This is called in libcudf when the input values column is empty. In such situations libcudf tries to generate the output directly without unnecessarily evaluating the intermediate data.

Parameters
streamThe CUDA stream to use for any kernel launches
mrDevice memory resource to use for any allocations
Returns
The output result of the aggregation when the input values column is empty

◆ get_group_labels()

device_span<size_type const> cudf::groupby_host_udf::get_group_labels ( ) const
inlineprotected

Access the group labels (which is also the same as group indices).

Returns
The array of group labels.

Definition at line 395 of file host_udf.hpp.

◆ get_group_offsets()

device_span<size_type const> cudf::groupby_host_udf::get_group_offsets ( ) const
inlineprotected

Access the offsets separating groups.

Returns
The array of group offsets.

Definition at line 384 of file host_udf.hpp.

◆ get_grouped_values()

column_view cudf::groupby_host_udf::get_grouped_values ( ) const
inlineprotected

Access the input values grouped according to the input keys for which the values within each group maintain their original order.

Returns
The grouped values column.

Definition at line 350 of file host_udf.hpp.

◆ get_input_values()

column_view cudf::groupby_host_udf::get_input_values ( ) const
inlineprotected

Access the input values column.

Returns
The input values column.

Definition at line 338 of file host_udf.hpp.

◆ get_num_groups()

size_type cudf::groupby_host_udf::get_num_groups ( ) const
inlineprotected

Access the number of groups (i.e., number of distinct keys).

Returns
The number of groups.

Definition at line 373 of file host_udf.hpp.

◆ get_sorted_grouped_values()

column_view cudf::groupby_host_udf::get_sorted_grouped_values ( ) const
inlineprotected

Access the input values grouped according to the input keys and sorted within each group.

Returns
The sorted grouped values column.

Definition at line 362 of file host_udf.hpp.

◆ operator()()

virtual std::unique_ptr<column> cudf::groupby_host_udf::operator() ( rmm::cuda_stream_view  stream,
rmm::device_async_resource_ref  mr 
) const
pure virtual

Perform the main groupby computation for the host-based UDF.

Parameters
streamThe CUDA stream to use for any kernel launches
mrDevice memory resource to use for any allocations
Returns
The output result of the aggregation

The documentation for this struct was generated from the following file: