Column Reduction#

group Reduction

Functions

cudf::size_type distinct_count(column_view const &input, null_policy null_handling, nan_policy nan_handling, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Count the distinct elements in the column_view.

If nulls_equal == nulls_equal::UNEQUAL, all nulls are distinct.

Given an input column_view, number of distinct elements in this column_view is returned.

If null_handling is null_policy::EXCLUDE and nan_handling is nan_policy::NAN_IS_NULL, both NaN and null values are ignored. If null_handling is null_policy::EXCLUDE and nan_handling is nan_policy::NAN_IS_VALID, only null is ignored, NaN is considered in distinct count.

nulls are handled as equal.

Parameters:

input – [in] The column_view whose distinct elements will be counted
null_handling – [in] flag to include or ignore null while counting
nan_handling – [in] flag to consider NaN==null or not
stream – [in] CUDA stream used for device memory operations and kernel launches

Returns:

number of distinct rows in the table

cudf::size_type distinct_count(table_view const &input, null_equality nulls_equal = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Count the distinct rows in a table.

Parameters:

input – [in] Table whose distinct rows will be counted
nulls_equal – [in] flag to denote if null elements should be considered equal. nulls are not equal if null_equality::UNEQUAL.
stream – [in] CUDA stream used for device memory operations and kernel launches

Returns:

number of distinct rows in the table

cudf::size_type unique_count(column_view const &input, null_policy null_handling, nan_policy nan_handling, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Count the number of consecutive groups of equivalent rows in a column.

nulls are handled as equal.

Parameters:

input – [in] The column_view whose consecutive groups of equivalent rows will be counted
null_handling – [in] flag to include or ignore null while counting
nan_handling – [in] flag to consider NaN==null or not
stream – [in] CUDA stream used for device memory operations and kernel launches

Returns:

number of consecutive groups of equivalent rows in the column

cudf::size_type unique_count(table_view const &input, null_equality nulls_equal = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Count the number of consecutive groups of equivalent rows in a table.

Parameters:

input – [in] Table whose consecutive groups of equivalent rows will be counted
nulls_equal – [in] flag to denote if null elements should be considered equal nulls are not equal if null_equality::UNEQUAL.
stream – [in] CUDA stream used for device memory operations and kernel launches

Returns:

number of consecutive groups of equivalent rows in the column

class approx_distinct_count#

#include <approx_distinct_count.hpp>

Object-oriented HyperLogLog sketch for approximate distinct counting.

This class provides an object-oriented interface to HyperLogLog sketches, allowing incremental addition of data and cardinality estimation.

The implementation uses XXHash64 to hash table rows into 64-bit values, which are then added to the HyperLogLog sketch without additional hashing (identity function).

Common precision values:

p = 10: m = 1,024 registers, ~3.2% standard error, 4KB memory
p = 12 (default): m = 4,096 registers, ~1.6% standard error, 16KB memory
p = 14: m = 16,384 registers, ~0.8% standard error, 64KB memory
p = 16: m = 65,536 registers, ~0.4% standard error, 256KB memory

HyperLogLog Precision Parameter

The precision parameter (p) is the number of bits used to index into the register array. It determines the number of registers (m = 2^p) in the HLL sketch:

Memory usage: 2^p * 4 bytes (m registers of 4 bytes each for GPU atomics)
Standard error: 1.04 / sqrt(m) = 1.04 / sqrt(2^p)

Valid range: p ∈ [4, 18]. This is not a hard theoretical limit but an empirically recommended range:

Below 4: Too few registers for HLL’s statistical assumptions, resulting in high variance and unstable estimates.
Above 18: Rapidly diminishing accuracy gains while incurring significant memory growth, making the structure no longer space-efficient for approximate counting.

This range represents a practical engineering compromise from HLL++ and is widely adopted by systems such as Apache Spark. The default of 12 aligns with Spark’s configuration and is the largest precision that fits efficiently in GPU shared memory, enabling optimal performance for our implementation.

Example usage:

auto adc = cudf::approx_distinct_count(table1);
auto count1 = adc.estimate();

adc.add(table2);
auto count2 = adc.estimate();

Public Types

using impl_type = cudf::detail::approx_distinct_count<cudf::hashing::detail::XXHash_64>#: Implementation type.

Public Functions

approx_distinct_count(table_view const &input, std::int32_t precision = 12, null_policy null_handling = null_policy::EXCLUDE, nan_policy nan_handling = nan_policy::NAN_IS_NULL, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Constructs an approximate distinct count sketch from a table with specified precision.

Parameters:

input – Table whose rows will be added to the sketch
precision – The precision parameter for HyperLogLog (4-18). Higher precision gives better accuracy but uses more memory. Default is 12.
null_handling – INCLUDE or EXCLUDE rows with nulls (default: EXCLUDE)
nan_handling – NAN_IS_VALID or NAN_IS_NULL (default: NAN_IS_NULL)
stream – CUDA stream used for device memory operations and kernel launches

approx_distinct_count(table_view const &input, desired_standard_error error, null_policy null_handling = null_policy::EXCLUDE, nan_policy nan_handling = nan_policy::NAN_IS_NULL, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Constructs an approximate distinct count sketch from a table with specified standard error.

This constructor allows specifying the desired standard error (error tolerance) directly, which is more intuitive than specifying the precision parameter. The precision is calculated as: ceil(2 * log2(1.04 / standard_error)).

Since precision must be an integer, the actual standard error may be better (smaller) than requested. Use the standard_error() getter to retrieve the actual value.

Parameters:

input – Table whose rows will be added to the sketch
error – The desired standard error (e.g., approx_distinct_count::desired_standard_error{0.01} for ~1%)
null_handling – INCLUDE or EXCLUDE rows with nulls (default: EXCLUDE)
nan_handling – NAN_IS_VALID or NAN_IS_NULL (default: NAN_IS_NULL)
stream – CUDA stream used for device memory operations and kernel launches

Throws:

std::invalid_argument – if standard_error value is not positive

approx_distinct_count(cuda::std::span<cuda::std::byte> sketch_span, std::int32_t precision, null_policy null_handling = null_policy::EXCLUDE, nan_policy nan_handling = nan_policy::NAN_IS_NULL)#

Constructs a non-owning sketch that operates on user-allocated storage.

This constructor creates a sketch that operates directly on the provided storage without copying. This enables zero-copy operations on pre-existing buffers, such as sketch data stored in a column or received from another process.

Warning

The caller must ensure the storage remains valid for the lifetime of this object. The sketch will read from and write to the provided storage directly.

Parameters:

sketch_span – The sketch bytes to operate on (must remain valid)
precision – The precision parameter for the sketch (4-18)
null_handling – INCLUDE or EXCLUDE rows with nulls (default: EXCLUDE)
nan_handling – NAN_IS_VALID or NAN_IS_NULL (default: NAN_IS_NULL)

approx_distinct_count(approx_distinct_count&&) = default#: Default move constructor.

approx_distinct_count &operator=(approx_distinct_count&&) = default#

Move assignment operator.

Returns:: A reference to this object

void add(table_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Adds rows from a table to the sketch.

Parameters:

input – Table whose rows will be added
stream – CUDA stream used for device memory operations and kernel launches

void merge(approx_distinct_count const &other, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Merges another sketch into this sketch.

After merging, this sketch will contain the combined distinct count estimate of both sketches.

Throws:

std::invalid_argument – if the sketches have different precision values
std::invalid_argument – if the sketches have different null handling policies
std::invalid_argument – if the sketches have different NaN handling policies

Parameters:

other – The sketch to merge into this sketch
stream – CUDA stream used for device memory operations and kernel launches

void merge(cuda::std::span<cuda::std::byte const> sketch_span, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Merges a sketch from raw bytes into this sketch.

This allows merging sketches that have been serialized or created elsewhere, enabling distributed distinct counting scenarios.

Warning

It is the caller’s responsibility to ensure that the provided sketch span was created with the same approx_distinct_count configuration (precision, null/NaN handling, etc.) as this sketch. Merging incompatible sketches will produce incorrect results.

Parameters:

sketch_span – The sketch bytes to merge into this sketch
stream – CUDA stream used for device memory operations and kernel launches

std::size_t estimate(rmm::cuda_stream_view stream = cudf::get_default_stream()) const#

Estimates the approximate number of distinct rows in the sketch.

Parameters:: stream – CUDA stream used for device memory operations and kernel launches
Returns:: Approximate number of distinct rows

cuda::std::span<cuda::std::byte> sketch() noexcept#

Gets the raw sketch bytes for serialization or external merging.

The returned span provides access to the internal sketch storage. This can be used to serialize the sketch, transfer it between processes, or merge it with other sketches using the span-based merge API.

Returns:: A span view of the sketch bytes

cuda::std::span<cuda::std::byte const> sketch() const noexcept#

Gets the raw sketch bytes for serialization or external merging (const overload)

Returns:: A span view of the sketch bytes

null_policy null_handling() const noexcept#

Gets the null handling policy for this sketch.

Returns:: The null policy set at construction

nan_policy nan_handling() const noexcept#

Gets the NaN handling policy for this sketch.

Returns:: The NaN policy set at construction

std::int32_t precision() const noexcept#

Gets the precision parameter for this sketch.

Returns:: The precision value set at construction

double standard_error() const noexcept#

Gets the standard error (error tolerance) for this sketch.

The standard error is calculated from precision as: 1.04 / sqrt(2^precision). This represents the expected relative error of the cardinality estimate.

Returns:: The actual standard error based on the sketch’s precision

Public Static Functions

static std::size_t sketch_bytes(std::int32_t precision)#

Gets the number of bytes required for sketch storage at a given precision.

Parameters:: precision – The HLL precision parameter (4-18)
Returns:: The number of bytes required for the sketch

static std::size_t sketch_alignment()#

Gets the alignment required for sketch storage.

Returns:: The required alignment in bytes

struct desired_standard_error#

#include <approx_distinct_count.hpp>

Strong type wrapper for the desired standard error constructor parameter.

Use this type to construct an approx_distinct_count with a desired error tolerance instead of specifying precision directly.

Example:

auto sketch = cudf::approx_distinct_count(
  table, cudf::approx_distinct_count::desired_standard_error{0.01});

Public Functions

inline explicit constexpr desired_standard_error(double v)#

Constructs a desired_standard_error with the given value.

Parameters:: v – The requested standard error value (must be positive, e.g., 0.01 for ~1% error)

Public Members

double value#: The requested standard error value (must be positive)

Column Reduction#

This Page