Classes | Public Types | Public Member Functions | List of all members
cudf::host_udf_base Struct Referenceabstract

The interface for host-based UDF implementation. More...

#include <host_udf.hpp>

Classes

struct  data_attribute
 Describe possible data that may be needed in the derived class for its operations. More...
 

Public Types

enum class  groupby_data_attribute : int32_t {
  INPUT_VALUES , GROUPED_VALUES , SORTED_GROUPED_VALUES , NUM_GROUPS ,
  GROUP_OFFSETS , GROUP_LABELS
}
 Define the possible data needed for groupby aggregations. More...
 
using data_attribute_set_t = std::unordered_set< data_attribute, data_attribute::hash, data_attribute::equal_to >
 Set of attributes for the input data that is needed for computing the aggregation.
 
using input_data_t = std::variant< column_view, size_type, device_span< size_type const > >
 Hold all possible types of the data that is passed to the derived class for executing the aggregation.
 
using input_map_t = std::unordered_map< data_attribute, input_data_t, data_attribute::hash, data_attribute::equal_to >
 Input to the aggregation, mapping from each data attribute to its actual data.
 
using output_t = std::variant< std::unique_ptr< column > >
 Output type of the aggregation. More...
 

Public Member Functions

virtual data_attribute_set_t get_required_data () const
 Return a set of attributes for the data that is needed for computing the aggregation. More...
 
virtual output_t get_empty_output (std::optional< data_type > output_dtype, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
 Get the output when the input values column is empty. More...
 
virtual output_t operator() (input_map_t const &input, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
 Perform the main computation for the host-based UDF. More...
 
virtual std::size_t do_hash () const
 Computes hash value of the class's instance. More...
 
virtual bool is_equal (host_udf_base const &other) const =0
 Compares two instances of the derived class for equality. More...
 
virtual std::unique_ptr< host_udf_baseclone () const =0
 Clones the instance. More...
 

Detailed Description

The interface for host-based UDF implementation.

An implementation of host-based UDF needs to be derived from this base class, defining its own version of the required functions. In particular:

Example of such implementation:

struct my_udf_aggregation : cudf::host_udf_base {
my_udf_aggregation() = default;
// This UDF aggregation needs `GROUPED_VALUES` and `GROUP_OFFSETS`,
// and the result from groupby `MAX` aggregation.
[[nodiscard]] data_attribute_set_t get_required_data() const override
{
cudf::make_max_aggregation<cudf::groupby_aggregation>()};
}
[[nodiscard]] output_t get_empty_output(
[[maybe_unused]] std::optional<cudf::data_type> output_dtype,
[[maybe_unused]] rmm::cuda_stream_view stream,
[[maybe_unused]] rmm::device_async_resource_ref mr) const override
{
// This UDF aggregation always returns a column of type INT32.
}
[[nodiscard]] output_t operator()(input_map_t const& input,
rmm::device_async_resource_ref mr) const override
{
// Perform UDF computation using the input data and return the result.
}
[[nodiscard]] bool is_equal(host_udf_base const& other) const override
{
// Check if the other object is also instance of this class.
return dynamic_cast<my_udf_aggregation const*>(&other) != nullptr;
}
[[nodiscard]] std::unique_ptr<host_udf_base> clone() const override
{
return std::make_unique<my_udf_aggregation>();
}
};
Indicator for the logical data type of an element in a column.
Definition: types.hpp:243
std::unique_ptr< column > make_empty_column(data_type type)
Creates an empty column of the specified type.
cuda::mr::async_resource_ref< cuda::mr::device_accessible > device_async_resource_ref
@ INT32
4 byte signed integer
The interface for host-based UDF implementation.
Definition: host_udf.hpp:99
virtual output_t operator()(input_map_t const &input, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
Perform the main computation for the host-based UDF.
std::unordered_set< data_attribute, data_attribute::hash, data_attribute::equal_to > data_attribute_set_t
Set of attributes for the input data that is needed for computing the aggregation.
Definition: host_udf.hpp:205
virtual bool is_equal(host_udf_base const &other) const =0
Compares two instances of the derived class for equality.
std::unordered_map< data_attribute, input_data_t, data_attribute::hash, data_attribute::equal_to > input_map_t
Input to the aggregation, mapping from each data attribute to its actual data.
Definition: host_udf.hpp:229
virtual data_attribute_set_t get_required_data() const
Return a set of attributes for the data that is needed for computing the aggregation.
Definition: host_udf.hpp:217
std::variant< std::unique_ptr< column > > output_t
Output type of the aggregation.
Definition: host_udf.hpp:237
virtual output_t get_empty_output(std::optional< data_type > output_dtype, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
Get the output when the input values column is empty.
@ GROUP_OFFSETS
The offsets separating groups.
virtual std::unique_ptr< host_udf_base > clone() const =0
Clones the instance.

Definition at line 99 of file host_udf.hpp.

Member Typedef Documentation

◆ output_t

using cudf::host_udf_base::output_t = std::variant<std::unique_ptr<column> >

Output type of the aggregation.

Currently only a single type is supported as the output of the aggregation, but it will hold more type in the future when reduction is supported.

Definition at line 237 of file host_udf.hpp.

Member Enumeration Documentation

◆ groupby_data_attribute

Define the possible data needed for groupby aggregations.

Note that only sort-based groupby aggregations are supported.

Enumerator
INPUT_VALUES 

The input values column.

GROUPED_VALUES 

The input values grouped according to the input keys for which the values within each group maintain their original order.

SORTED_GROUPED_VALUES 

The input values grouped according to the input keys and sorted within each group.

NUM_GROUPS 

The number of groups (i.e., number of distinct keys).

GROUP_OFFSETS 

The offsets separating groups.

GROUP_LABELS 

Group labels (which is also the same as group indices).

Definition at line 108 of file host_udf.hpp.

Member Function Documentation

◆ clone()

virtual std::unique_ptr<host_udf_base> cudf::host_udf_base::clone ( ) const
pure virtual

Clones the instance.

A class derived from host_udf_base should not store too much data such that its instances remain lightweight for efficient cloning.

Returns
A new instance cloned from this

◆ do_hash()

virtual std::size_t cudf::host_udf_base::do_hash ( ) const
inlinevirtual

Computes hash value of the class's instance.

Returns
The hash value of the instance

Definition at line 270 of file host_udf.hpp.

◆ get_empty_output()

virtual output_t cudf::host_udf_base::get_empty_output ( std::optional< data_type output_dtype,
rmm::cuda_stream_view  stream,
rmm::device_async_resource_ref  mr 
) const
pure virtual

Get the output when the input values column is empty.

This is called in libcudf when the input values column is empty. In such situations libcudf tries to generate the output directly without unnecessarily evaluating the intermediate data.

Parameters
output_dtypeThe expected output data type
streamThe CUDA stream to use for any kernel launches
mrDevice memory resource to use for any allocations
Returns
The output result of the aggregation when input values is empty

◆ get_required_data()

virtual data_attribute_set_t cudf::host_udf_base::get_required_data ( ) const
inlinevirtual

Return a set of attributes for the data that is needed for computing the aggregation.

The derived class should return the attributes corresponding to only the data that it needs to avoid unnecessary computation performed in libcudf. If this function is not overridden, an empty set is returned. That means all the data attributes (except results from other aggregations in groupby) will be needed.

Returns
A set of data_attribute

Definition at line 217 of file host_udf.hpp.

◆ is_equal()

virtual bool cudf::host_udf_base::is_equal ( host_udf_base const &  other) const
pure virtual

Compares two instances of the derived class for equality.

Parameters
otherThe other derived class's instance to compare with
Returns
True if the two instances are equal

◆ operator()()

virtual output_t cudf::host_udf_base::operator() ( input_map_t const &  input,
rmm::cuda_stream_view  stream,
rmm::device_async_resource_ref  mr 
) const
pure virtual

Perform the main computation for the host-based UDF.

Parameters
inputThe input data needed for performing all computation
streamThe CUDA stream to use for any kernel launches
mrDevice memory resource to use for any allocations
Returns
The output result of the aggregation

The documentation for this struct was generated from the following file: