reduce#

pylibcudf.reduce.ScanType#

See also scan_type.

Enum members

INCLUSIVE
EXCLUSIVE

pylibcudf.reduce.distinct_count(Column source, null_policy null_handling, nan_policy nan_handling, Stream stream=None) → size_type#

Returns the number of distinct elements in the input column.

For details, see cudf::distinct_count().

Parameters:

sourceColumn: The input column to count the unique elements of.
null_handlingnull_policy: Flag to include or exclude nulls from the count.
nan_handlingnan_policy: Flag to include or exclude NaNs from the count.

Returns:

size_type: The number of distinct elements in the input column.

pylibcudf.reduce.is_valid_reduce_aggregation(DataType source, Aggregation agg) → bool#

Return if an aggregation is supported for a given datatype.

Parameters:

source: The type of the column the aggregation is being performed on.
agg: The aggregation.

Returns:

True if the aggregation is supported.

pylibcudf.reduce.minmax(Column col, Stream stream=None, DeviceMemoryResource mr=None) → tuple#

Compute the minimum and maximum of a column

For details, see cudf::minmax documentation.

Parameters:

colColumn: The column to compute the minimum and maximum of.
streamStream | None: CUDA stream on which to perform the operation.
mrDeviceMemoryResource | None: Device memory resource used to allocate the returned scalars’ device memory.

Returns:

tuple: A tuple of two Scalars, the first being the minimum and the second being the maximum.

pylibcudf.reduce.reduce(Column col, Aggregation agg, DataType data_type, Scalar init=None, Stream stream=None, DeviceMemoryResource mr=None) → Scalar#

Perform a reduction on a column

For details, see cudf::reduce documentation.

Parameters:

colColumn: The column to perform the reduction on.
aggAggregation: The aggregation to perform.
data_typeDataType: The data type of the result.
initScalar | None: The initial value for the reduction.
streamStream | None: CUDA stream on which to perform the operation.
mrDeviceMemoryResource | None: Device memory resource used to allocate the returned scalar’s device memory.

Returns:

Scalar: The result of the reduction.

pylibcudf.reduce.scan(Column col, Aggregation agg, scan_type inclusive, Stream stream=None, DeviceMemoryResource mr=None) → Column#

Perform a scan on a column

For details, see cudf::scan documentation.

Parameters:

colColumn: The column to perform the scan on.
aggAggregation: The aggregation to perform.
inclusivescan_type: The type of scan to perform.
streamStream | None: CUDA stream on which to perform the operation.
mrDeviceMemoryResource | None: Device memory resource used to allocate the returned column’s device memory.

Returns:

Column: The result of the scan.

pylibcudf.reduce.unique_count(Column source, null_policy null_handling, nan_policy nan_handling, Stream stream=None) → size_type#

Returns the number of unique consecutive elements in the input column.

For details, see cudf::unique_count().

Parameters:

sourceColumn: The input column to count the unique elements of.
null_handlingnull_policy: Flag to include or exclude nulls from the count.
nan_handlingnan_policy: Flag to include or exclude NaNs from the count.

Returns:

size_type: The number of unique consecutive elements in the input column.

Notes

If the input column is sorted, then unique_count can produce the same result as distinct_count, but faster.

reduce#

This Page