reduce#

pylibcudf.reduce.ScanType#

See also scan_type.

Enum members

  • INCLUSIVE

  • EXCLUSIVE

pylibcudf.reduce.distinct_count(Column source, null_policy null_handling, nan_policy nan_handling, Stream stream=None) size_type#

Returns the number of distinct elements in the input column.

For details, see cudf::distinct_count().

Parameters:
sourceColumn

The input column to count the unique elements of.

null_handlingnull_policy

Flag to include or exclude nulls from the count.

nan_handlingnan_policy

Flag to include or exclude NaNs from the count.

Returns:
size_type

The number of distinct elements in the input column.

pylibcudf.reduce.is_valid_reduce_aggregation(DataType source, Aggregation agg) bool#

Return if an aggregation is supported for a given datatype.

Parameters:
source

The type of the column the aggregation is being performed on.

agg

The aggregation.

Returns:
True if the aggregation is supported.
pylibcudf.reduce.minmax(Column col, Stream stream=None, DeviceMemoryResource mr=None) tuple#

Compute the minimum and maximum of a column

For details, see cudf::minmax documentation.

Parameters:
colColumn

The column to compute the minimum and maximum of.

streamStream | None

CUDA stream on which to perform the operation.

mrDeviceMemoryResource | None

Device memory resource used to allocate the returned scalars’ device memory.

Returns:
tuple

A tuple of two Scalars, the first being the minimum and the second being the maximum.

pylibcudf.reduce.reduce(Column col, Aggregation agg, DataType data_type, Scalar init=None, Stream stream=None, DeviceMemoryResource mr=None) Scalar#

Perform a reduction on a column

For details, see cudf::reduce documentation.

Parameters:
colColumn

The column to perform the reduction on.

aggAggregation

The aggregation to perform.

data_typeDataType

The data type of the result.

initScalar | None

The initial value for the reduction.

streamStream | None

CUDA stream on which to perform the operation.

mrDeviceMemoryResource | None

Device memory resource used to allocate the returned scalar’s device memory.

Returns:
Scalar

The result of the reduction.

pylibcudf.reduce.scan(Column col, Aggregation agg, scan_type inclusive, Stream stream=None, DeviceMemoryResource mr=None) Column#

Perform a scan on a column

For details, see cudf::scan documentation.

Parameters:
colColumn

The column to perform the scan on.

aggAggregation

The aggregation to perform.

inclusivescan_type

The type of scan to perform.

streamStream | None

CUDA stream on which to perform the operation.

mrDeviceMemoryResource | None

Device memory resource used to allocate the returned column’s device memory.

Returns:
Column

The result of the scan.

pylibcudf.reduce.unique_count(Column source, null_policy null_handling, nan_policy nan_handling, Stream stream=None) size_type#

Returns the number of unique consecutive elements in the input column.

For details, see cudf::unique_count().

Parameters:
sourceColumn

The input column to count the unique elements of.

null_handlingnull_policy

Flag to include or exclude nulls from the count.

nan_handlingnan_policy

Flag to include or exclude NaNs from the count.

Returns:
size_type

The number of unique consecutive elements in the input column.

Notes

If the input column is sorted, then unique_count can produce the same result as distinct_count, but faster.