Files | |
file | reduction.hpp |
Enumerations | |
enum class | cudf::scan_type : bool { INCLUSIVE , EXCLUSIVE } |
Enum to describe scan operation type. | |
std::pair<std::unique_ptr<scalar>, std::unique_ptr<scalar> > cudf::minmax | ( | column_view const & | col, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Determines the minimum and maximum values of a column.
col | column to compute minmax |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<scalar> cudf::reduce | ( | column_view const & | col, |
reduce_aggregation const & | agg, | ||
data_type | output_type, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Computes the reduction of the values in all rows of a column.
This function does not detect overflows in reductions except for the SUM_WITH_OVERFLOW
aggregation. When output_type
does not match the col.type()
, their values may be promoted to int64_t
or double
for computing aggregations and then cast to output_type
before returning.
The SUM_WITH_OVERFLOW
aggregation is a special case that detects integer overflow during summation of int64_t
values and returns a struct containing both the sum result and an overflow flag.
Only min
and max
ops are supported for reduction of non-arithmetic types (e.g. timestamp or string).
Any null values are skipped for the operation. If the reduction fails, the output scalar returns with is_valid()==false
.
For empty or all-null input, the result is generally an invalid scalar except for specific aggregations where the aggregation has a well-defined output.
If the input column is an arithmetic type, the output_type
can be any arithmetic type. If the input column is a non-arithmetic type (e.g. timestamp or string) the output_type
must match the col.type()
. If the reduction type is any
or all
, the output_type
must be type BOOL8.
Aggregation | Output Type | Init Value | Empty Input | Comments |
---|---|---|---|---|
SUM/PRODUCT | output_type | yes | NA | Input accumulated into output_type variable |
SUM_WITH_OVERFLOW | STRUCT{INT64,BOOL8} | yes | {null,false} | {sum, overflow_flag}, input must be INT64 |
SUM_OF_SQUARES | output_type | no | NA | Input accumulated into output_type variable |
MIN/MAX | col.type | yes | NA | Supports arithmetic, timestamp, duration, string types only |
ANY/ALL | BOOL8 | yes | True for ALL only | Checks for non-zero elements |
MEAN/VARIANCE/STD | FLOAT32/FLOAT64 | no | NA | output_type must be a float type |
MEDIAN/QUANTILE | output_type | no | NA | Exact value if output_type is FLOAT64. See cudf::quantile |
NUNIQUE | output_type | no | 1 if all-nulls | May process null rows |
NTH_ELEMENT | col.type | no | NA | |
BITWISE_AGG | col.type | no | NA | Supports only integral types |
HISTOGRAM/MERGE_HISTOGRAM | LIST of col.type | no | empty list returned | |
COLLECT_LIST/COLLECT_SET | LIST of col.type | no | empty list returned | |
TDIGEST/MERGE_TDIGEST | STRUCT | no | empty struct returned | tdigest scalar is returned |
HOST_UDF | output_type | yes | NA | Custom UDF could ignore output_type |
The NA in the table indicates an output scalar with is_valid()==false
std::invalid_argument | if reduction is called for non-arithmetic output type and operator other than min and max . |
std::invalid_argument | if input column data type is not convertible to output_type . |
std::invalid_argument | if min or max reduction is called and the output type does not match the input column data type. |
std::invalid_argument | if any or all reduction is called and the output type is not BOOL8. |
std::invalid_argument | if mean , var , or std reduction is called and the output_type is not floating point. |
std::invalid_argument | if sum_with_overflow reduction is called and the input column type is not INT64 or the output_dtype is not STRUCT . |
col | Input column view |
agg | Aggregation operator applied by the reduction |
output_type | The output scalar type |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned scalar's device memory |
std::unique_ptr<scalar> cudf::reduce | ( | column_view const & | col, |
reduce_aggregation const & | agg, | ||
data_type | output_type, | ||
std::optional< std::reference_wrapper< scalar const >> | init, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Computes the reduction of the values in all rows of a column with an initial value.
Only sum
, product
, min
, max
, any
, all
, and sum_with_overflow
reductions are supported. For sum_with_overflow
, the initial value is added to the sum and overflow detection is performed throughout the entire computation.
std::invalid_argument | if reduction is not sum , product , min , max , any , all , or sum_with_overflow and init is specified. |
col | Input column view |
agg | Aggregation operator applied by the reduction |
output_type | The output scalar type |
init | The initial value of the reduction |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned scalar's device memory |
std::unique_ptr<column> cudf::scan | ( | column_view const & | input, |
scan_aggregation const & | agg, | ||
scan_type | inclusive, | ||
null_policy | null_handling = null_policy::EXCLUDE , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Computes the scan of a column.
The null values are skipped for the operation, and if an input element at i
is null, then the output element at i
will also be null.
cudf::logic_error | if column datatype is not numeric type. |
[in] | input | The input column view for the scan |
[in] | agg | unique_ptr to aggregation operator applied by the scan |
[in] | inclusive | The flag for applying an inclusive scan if scan_type::INCLUSIVE, an exclusive scan if scan_type::EXCLUSIVE. |
[in] | null_handling | Exclude null values when computing the result if null_policy::EXCLUDE. Include nulls if null_policy::INCLUDE. Any operation with a null results in a null. |
[in] | stream | CUDA stream used for device memory operations and kernel launches |
[in] | mr | Device memory resource used to allocate the returned scalar's device memory |
std::unique_ptr<column> cudf::segmented_reduce | ( | column_view const & | segmented_values, |
device_span< size_type const > | offsets, | ||
segmented_reduce_aggregation const & | agg, | ||
data_type | output_type, | ||
null_policy | null_handling, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Compute reduction of each segment in the input column.
This function does not detect overflows in reductions. When output_type
does not match the segmented_values.type()
, their values may be promoted to int64_t
or double
for computing aggregations and then cast to output_type
before returning.
Null values are treated as identities during reduction.
If the segment is empty, the row corresponding to the result of the segment is null.
If any index in offsets
is out of bound of segmented_values
, the behavior is undefined.
If the input column has arithmetic type, output_type
can be any arithmetic type. If the input column has non-arithmetic type, e.g. timestamp, the same output type must be specified.
If input is not empty, the result is always nullable.
cudf::logic_error | if reduction is called for non-arithmetic output type and operator other than min and max . |
cudf::logic_error | if input column data type is not convertible to output_type type. |
cudf::logic_error | if min or max reduction is called and the output_type does not match the input column data type. |
cudf::logic_error | if any or all reduction is called and the output_type is not BOOL8. |
segmented_values | Column view of segmented inputs |
offsets | Each segment's offset of segmented_values . A list of offsets with size num_segments + 1 . The size of i th segment is offsets[i+1] - offsets[i] . |
agg | Aggregation operator applied by the reduction |
output_type | The output column type |
null_handling | If INCLUDE , the reduction is valid if all elements in a segment are valid, otherwise null. If EXCLUDE , the reduction is valid if any element in the segment is valid, otherwise null. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned scalar's device memory |
std::unique_ptr<column> cudf::segmented_reduce | ( | column_view const & | segmented_values, |
device_span< size_type const > | offsets, | ||
segmented_reduce_aggregation const & | agg, | ||
data_type | output_type, | ||
null_policy | null_handling, | ||
std::optional< std::reference_wrapper< scalar const >> | init, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Compute reduction of each segment in the input column with an initial value. Only SUM, PRODUCT, MIN, MAX, ANY, and ALL aggregations are supported.
segmented_values | Column view of segmented inputs |
offsets | Each segment's offset of segmented_values . A list of offsets with size num_segments + 1 . The size of i th segment is offsets[i+1] - offsets[i] . |
agg | Aggregation operator applied by the reduction |
output_type | The output column type |
null_handling | If INCLUDE , the reduction is valid if all elements in a segment are valid, otherwise null. If EXCLUDE , the reduction is valid if any element in the segment is valid, otherwise null. |
init | The initial value of the reduction |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned scalar's device memory |