Files | |
file | quantiles.hpp |
Functions | |
std::unique_ptr< column > | cudf::quantile (column_view const &input, std::vector< double > const &q, interpolation interp=interpolation::LINEAR, column_view const &ordered_indices={}, bool exact=true, rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Computes quantiles with interpolation. More... | |
std::unique_ptr< table > | cudf::quantiles (table_view const &input, std::vector< double > const &q, interpolation interp=interpolation::NEAREST, cudf::sorted is_input_sorted=sorted::NO, std::vector< order > const &column_order={}, std::vector< null_order > const &null_precedence={}, rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Returns the rows of the input corresponding to the requested quantiles. More... | |
std::unique_ptr< column > | cudf::percentile_approx (tdigest::tdigest_column_view const &input, column_view const &percentiles, rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Calculate approximate percentiles on an input tdigest column. More... | |
std::unique_ptr<column> cudf::percentile_approx | ( | tdigest::tdigest_column_view const & | input, |
column_view const & | percentiles, | ||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Calculate approximate percentiles on an input tdigest column.
tdigest (https://arxiv.org/pdf/1902.04023.pdf) columns are produced specifically by the TDIGEST and MERGE_TDIGEST aggregations. These columns represent compressed representations of a very large input data set that can be queried for quantile information.
Produces a LIST column where each row i
represents output from querying the corresponding tdigest from input
row i
. The length of each output list is the number of percentages specified in percentages
.
input | tdigest input data. One tdigest per row |
percentiles | Desired percentiles in range [0, 1] |
mr | Device memory resource used to allocate the returned column's device memory |
cudf::logic_error | if input is not a valid tdigest column. |
cudf::logic_error | if percentiles is not a FLOAT64 column. |
std::unique_ptr<column> cudf::quantile | ( | column_view const & | input, |
std::vector< double > const & | q, | ||
interpolation | interp = interpolation::LINEAR , |
||
column_view const & | ordered_indices = {} , |
||
bool | exact = true , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Computes quantiles with interpolation.
Computes the specified quantiles by interpolating values between which they lie, using the interpolation strategy specified in interp
.
[in] | input | Column from which to compute quantile values |
[in] | q | Specified quantiles in range [0, 1] |
[in] | interp | Strategy used to select between values adjacent to a specified quantile. |
[in] | ordered_indices | Column containing the sorted order of input . If the column is empty, all input values are used in existing order. Indices must be in range [0, input.size() ), but are not required to be unique. Values not indexed by this column will be ignored. |
[in] | exact | If true, returns doubles. If false, returns same type as input. |
[in] | mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<table> cudf::quantiles | ( | table_view const & | input, |
std::vector< double > const & | q, | ||
interpolation | interp = interpolation::NEAREST , |
||
cudf::sorted | is_input_sorted = sorted::NO , |
||
std::vector< order > const & | column_order = {} , |
||
std::vector< null_order > const & | null_precedence = {} , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns the rows of the input corresponding to the requested quantiles.
Quantiles are cut points that divide the range of a dataset into continuous intervals. e.g: quartiles are the three cut points that divide a dataset into four equal-sized groups. See https://en.wikipedia.org/wiki/Quantile
The indices used to gather rows are computed by interpolating between the index on either side of the desired quantile. Since some columns may be non-arithmetic, interpolation between rows is limited to non-arithmetic strategies.
Non-arithmetic interpolation strategies include HIGHER, LOWER, and NEAREST.
quantiles <= 0
correspond to row 0
. (first) quantiles >= 1
correspond to row input.size() - 1
. (last)
input | Table used to compute quantile rows |
q | Desired quantiles in range [0, 1] |
interp | Strategy used to select between the two rows on either side of the desired quantile. |
is_input_sorted | Indicates if the input has been pre-sorted |
column_order | The desired sort order for each column |
null_precedence | The desired order of null compared to other elements |
mr | Device memory resource used to allocate the returned table's device memory |
cudf::logic_error | if interp is an arithmetic interpolation strategy |
cudf::logic_error | if input is empty |