cudf.DataFrame.quantile#

DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation=None, columns=None, exact=True, method='single')[source]#

Return values at the given quantile.

Parameters:
qfloat or array-like

0 <= q <= 1, the quantile(s) to compute

axisint

axis is a NON-FUNCTIONAL parameter

numeric_onlybool, default True

If False, the quantile of datetime and timedelta data will be computed as well.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j. Default is 'linear' for method="single", and 'nearest' for method="table".

  • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

  • lower: i.

  • higher: j.

  • nearest: i or j whichever is nearest.

  • midpoint: (i + j) / 2.

columnslist of str

List of column names to include.

exactboolean

Whether to use approximate or exact quantile algorithm.

method{‘single’, ‘table’}, default ‘single’

Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). When ‘table’, the only allowed interpolation methods are ‘nearest’, ‘lower’, and ‘higher’.

Returns:
Series or DataFrame

If q is an array or numeric_only is set to False, a DataFrame will be returned where index is q, the columns are the columns of self, and the values are the quantile.

If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.

Examples

>>> import cupy as cp
>>> import cudf
>>> df = cudf.DataFrame(cp.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df
   a    b
0  1    1
1  2   10
2  3  100
3  4  100
>>> df.quantile(0.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Pandas Compatibility Note

pandas.DataFrame.quantile()

One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series.