Files | Functions

Files

file  quantiles.hpp
 

Functions

std::unique_ptr< columncudf::quantile (column_view const &input, std::vector< double > const &q, interpolation interp=interpolation::LINEAR, column_view const &ordered_indices={}, bool exact=true, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Computes quantiles with interpolation. More...
 
std::unique_ptr< tablecudf::quantiles (table_view const &input, std::vector< double > const &q, interpolation interp=interpolation::NEAREST, cudf::sorted is_input_sorted=sorted::NO, std::vector< order > const &column_order={}, std::vector< null_order > const &null_precedence={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the rows of the input corresponding to the requested quantiles. More...
 
std::unique_ptr< columncudf::percentile_approx (tdigest::tdigest_column_view const &input, column_view const &percentiles, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Calculate approximate percentiles on an input tdigest column. More...
 

Detailed Description

Function Documentation

◆ percentile_approx()

std::unique_ptr<column> cudf::percentile_approx ( tdigest::tdigest_column_view const &  input,
column_view const &  percentiles,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Calculate approximate percentiles on an input tdigest column.

tdigest (https://arxiv.org/pdf/1902.04023.pdf) columns are produced specifically by the TDIGEST and MERGE_TDIGEST aggregations. These columns represent compressed representations of a very large input data set that can be queried for quantile information.

Produces a LIST column where each row i represents output from querying the corresponding tdigest from input row i. The length of each output list is the number of percentages specified in percentages.

Parameters
inputtdigest input data. One tdigest per row
percentilesDesired percentiles in range [0, 1]
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Exceptions
cudf::logic_errorif input is not a valid tdigest column.
cudf::logic_errorif percentiles is not a FLOAT64 column.
Returns
LIST Column containing requested percentile values as FLOAT64

◆ quantile()

std::unique_ptr<column> cudf::quantile ( column_view const &  input,
std::vector< double > const &  q,
interpolation  interp = interpolation::LINEAR,
column_view const &  ordered_indices = {},
bool  exact = true,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Computes quantiles with interpolation.

Computes the specified quantiles by interpolating values between which they lie, using the interpolation strategy specified in interp.

Parameters
[in]inputColumn from which to compute quantile values
[in]qSpecified quantiles in range [0, 1]
[in]interpStrategy used to select between values adjacent to a specified quantile.
[in]ordered_indicesColumn containing the sorted order of input. If the column is empty, all input values are used in existing order. Indices must be in range [0, input.size()), but are not required to be unique. Values not indexed by this column will be ignored.
[in]exactIf true, returns doubles. If false, returns same type as input.
[in]streamCUDA stream used for device memory operations and kernel launches
[in]mrDevice memory resource used to allocate the returned column's device memory
Returns
Column of specified quantiles, with nulls for indeterminable values

◆ quantiles()

std::unique_ptr<table> cudf::quantiles ( table_view const &  input,
std::vector< double > const &  q,
interpolation  interp = interpolation::NEAREST,
cudf::sorted  is_input_sorted = sorted::NO,
std::vector< order > const &  column_order = {},
std::vector< null_order > const &  null_precedence = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the rows of the input corresponding to the requested quantiles.

Quantiles are cut points that divide the range of a dataset into continuous intervals. e.g: quartiles are the three cut points that divide a dataset into four equal-sized groups. See https://en.wikipedia.org/wiki/Quantile

The indices used to gather rows are computed by interpolating between the index on either side of the desired quantile. Since some columns may be non-arithmetic, interpolation between rows is limited to non-arithmetic strategies.

Non-arithmetic interpolation strategies include HIGHER, LOWER, and NEAREST.

quantiles <= 0 correspond to row 0. (first) quantiles >= 1 correspond to row input.size() - 1. (last)

Parameters
inputTable used to compute quantile rows
qDesired quantiles in range [0, 1]
interpStrategy used to select between the two rows on either side of the desired quantile.
is_input_sortedIndicates if the input has been pre-sorted
column_orderThe desired sort order for each column
null_precedenceThe desired order of null compared to other elements
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table's device memory
Returns
Table of specified quantiles, with nulls for indeterminable values
Exceptions
cudf::logic_errorif interp is an arithmetic interpolation strategy
cudf::logic_errorif input is empty