stream_compaction#

pylibcudf.stream_compaction.DuplicateKeepOption#

See also cudf::duplicate_keep_option.

Enum members

  • KEEP_ANY

  • KEEP_FIRST

  • KEEP_LAST

  • KEEP_NONE

pylibcudf.stream_compaction.apply_boolean_mask(Table source_table, Column boolean_mask) Table#

Filters out rows from the input table based on a boolean mask.

For details, see apply_boolean_mask().

Parameters:
source_tableTable

The input table to filter.

boolean_maskColumn

The boolean mask to apply to the input table.

Returns:
Table

A new table with rows removed based on the boolean mask.

pylibcudf.stream_compaction.distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) Table#

Get the distinct rows from the input table.

For details, see distinct().

Parameters:
inputTable

The input table to filter.

keyslist

The list of column indexes to consider for distinct filtering.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

nans_equalnan_equality

The option to specify how NaNs are handled in the comparison.

Returns:
Table

A new table with distinct rows from the input table. The output will not necessarily be in the same order as the input.

pylibcudf.stream_compaction.distinct_count(Column source, null_policy null_handling, nan_policy nan_handling) size_type#

Returns the number of distinct elements in the input column.

For details, see distinct_count().

Parameters:
sourceColumn

The input column to count the unique elements of.

null_handlingnull_policy

Flag to include or exclude nulls from the count.

nan_handlingnan_policy

Flag to include or exclude NaNs from the count.

Returns:
size_type

The number of distinct elements in the input column.

pylibcudf.stream_compaction.distinct_indices(Table input, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) Column#

Get the indices of the distinct rows from the input table.

For details, see distinct_indices().

Parameters:
inputTable

The input table to filter.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

nans_equalnan_equality

The option to specify how NaNs are handled in the comparison.

Returns:
Column

A new column with the indices of the distinct rows from the input table.

pylibcudf.stream_compaction.drop_nans(Table source_table, list keys, size_type keep_threshold) Table#

Filters out rows from the input table based on the presence of NaNs.

For details, see drop_nans().

Parameters:
source_tableTable

The input table to filter.

keysList[size_type]

The list of column indexes to consider for NaN filtering.

keep_thresholdsize_type

The minimum number of non-NaNs required to keep a row.

Returns:
Table

A new table with rows removed based on NaNs.

pylibcudf.stream_compaction.drop_nulls(Table source_table, list keys, size_type keep_threshold) Table#

Filters out rows from the input table based on the presence of nulls.

For details, see drop_nulls().

Parameters:
source_tableTable

The input table to filter.

keysList[size_type]

The list of column indexes to consider for null filtering.

keep_thresholdsize_type

The minimum number of non-nulls required to keep a row.

Returns:
Table

A new table with rows removed based on the null count.

pylibcudf.stream_compaction.stable_distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) Table#

Get the distinct rows from the input table, preserving input order.

For details, see stable_distinct().

Parameters:
inputTable

The input table to filter.

keyslist

The list of column indexes to consider for distinct filtering.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

nans_equalnan_equality

The option to specify how NaNs are handled in the comparison.

Returns:
Table

A new table with distinct rows from the input table, preserving the input table order.

pylibcudf.stream_compaction.unique(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal) Table#

Filter duplicate consecutive rows from the input table.

For details, see unique().

Parameters:
inputTable

The input table to filter

keyslist[int]

The list of column indexes to consider for filtering.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

Returns:
Table

New Table with unique rows from each sequence of equivalent rows as specified by keep. In the same order as the input table.

Notes

If the input columns to be filtered on are sorted, then unique can produce the same result as stable_distinct, but faster.

pylibcudf.stream_compaction.unique_count(Column source, null_policy null_handling, nan_policy nan_handling) size_type#

Returns the number of unique consecutive elements in the input column.

For details, see unique_count().

Parameters:
sourceColumn

The input column to count the unique elements of.

null_handlingnull_policy

Flag to include or exclude nulls from the count.

nan_handlingnan_policy

Flag to include or exclude NaNs from the count.

Returns:
size_type

The number of unique consecutive elements in the input column.

Notes

If the input column is sorted, then unique_count can produce the same result as distinct_count, but faster.