The interface for host-based UDF implementation. More...
#include <host_udf.hpp>
Classes | |
struct | data_attribute |
Describe possible data that may be needed in the derived class for its operations. More... | |
Public Types | |
enum class | groupby_data_attribute : int32_t { INPUT_VALUES , GROUPED_VALUES , SORTED_GROUPED_VALUES , NUM_GROUPS , GROUP_OFFSETS , GROUP_LABELS } |
Define the possible data needed for groupby aggregations. More... | |
using | data_attribute_set_t = std::unordered_set< data_attribute, data_attribute::hash, data_attribute::equal_to > |
Set of attributes for the input data that is needed for computing the aggregation. | |
using | input_data_t = std::variant< column_view, size_type, device_span< size_type const > > |
Hold all possible types of the data that is passed to the derived class for executing the aggregation. | |
using | input_map_t = std::unordered_map< data_attribute, input_data_t, data_attribute::hash, data_attribute::equal_to > |
Input to the aggregation, mapping from each data attribute to its actual data. | |
using | output_t = std::variant< std::unique_ptr< column > > |
Output type of the aggregation. More... | |
Public Member Functions | |
virtual data_attribute_set_t | get_required_data () const |
Return a set of attributes for the data that is needed for computing the aggregation. More... | |
virtual output_t | get_empty_output (std::optional< data_type > output_dtype, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0 |
Get the output when the input values column is empty. More... | |
virtual output_t | operator() (input_map_t const &input, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0 |
Perform the main computation for the host-based UDF. More... | |
virtual std::size_t | do_hash () const |
Computes hash value of the class's instance. More... | |
virtual bool | is_equal (host_udf_base const &other) const =0 |
Compares two instances of the derived class for equality. More... | |
virtual std::unique_ptr< host_udf_base > | clone () const =0 |
Clones the instance. More... | |
The interface for host-based UDF implementation.
An implementation of host-based UDF needs to be derived from this base class, defining its own version of the required functions. In particular:
get_empty_output
, operator()
, is_equal
, and clone
functions.do_hash
to compute hashing for its instance, and get_required_data
to selectively access to the input data as well as intermediate data provided by libcudf.Example of such implementation:
Definition at line 99 of file host_udf.hpp.
using cudf::host_udf_base::output_t = std::variant<std::unique_ptr<column> > |
Output type of the aggregation.
Currently only a single type is supported as the output of the aggregation, but it will hold more type in the future when reduction is supported.
Definition at line 237 of file host_udf.hpp.
|
strong |
Define the possible data needed for groupby aggregations.
Note that only sort-based groupby aggregations are supported.
Definition at line 108 of file host_udf.hpp.
|
pure virtual |
Clones the instance.
A class derived from host_udf_base
should not store too much data such that its instances remain lightweight for efficient cloning.
|
inlinevirtual |
Computes hash value of the class's instance.
Definition at line 270 of file host_udf.hpp.
|
pure virtual |
Get the output when the input values column is empty.
This is called in libcudf when the input values column is empty. In such situations libcudf tries to generate the output directly without unnecessarily evaluating the intermediate data.
output_dtype | The expected output data type |
stream | The CUDA stream to use for any kernel launches |
mr | Device memory resource to use for any allocations |
|
inlinevirtual |
Return a set of attributes for the data that is needed for computing the aggregation.
The derived class should return the attributes corresponding to only the data that it needs to avoid unnecessary computation performed in libcudf. If this function is not overridden, an empty set is returned. That means all the data attributes (except results from other aggregations in groupby) will be needed.
data_attribute
Definition at line 217 of file host_udf.hpp.
|
pure virtual |
Compares two instances of the derived class for equality.
other | The other derived class's instance to compare with |
|
pure virtual |
Perform the main computation for the host-based UDF.
input | The input data needed for performing all computation |
stream | The CUDA stream to use for any kernel launches |
mr | Device memory resource to use for any allocations |