stream_compaction#

pylibcudf.stream_compaction.DuplicateKeepOption#

See also cudf::duplicate_keep_option.

Enum members

KEEP_ANY
KEEP_FIRST
KEEP_LAST
KEEP_NONE

pylibcudf.stream_compaction.apply_boolean_mask(Table source_table, Column boolean_mask) → Table#

Filters out rows from the input table based on a boolean mask.

For details, see apply_boolean_mask().

Parameters:

source_tableTable: The input table to filter.
boolean_maskColumn: The boolean mask to apply to the input table.

Returns:

Table: A new table with rows removed based on the boolean mask.

pylibcudf.stream_compaction.distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) → Table#

Get the distinct rows from the input table.

For details, see distinct().

Parameters:

inputTable: The input table to filter.
keyslist: The list of column indexes to consider for distinct filtering.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.
nans_equalnan_equality: The option to specify how NaNs are handled in the comparison.

Returns:

Table: A new table with distinct rows from the input table. The output will not necessarily be in the same order as the input.

pylibcudf.stream_compaction.distinct_count(Column source, null_policy null_handling, nan_policy nan_handling) → size_type#

Returns the number of distinct elements in the input column.

For details, see distinct_count().

Parameters:

sourceColumn: The input column to count the unique elements of.
null_handlingnull_policy: Flag to include or exclude nulls from the count.
nan_handlingnan_policy: Flag to include or exclude NaNs from the count.

Returns:

size_type: The number of distinct elements in the input column.

pylibcudf.stream_compaction.distinct_indices(Table input, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) → Column#

Get the indices of the distinct rows from the input table.

For details, see distinct_indices().

Parameters:

inputTable: The input table to filter.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.
nans_equalnan_equality: The option to specify how NaNs are handled in the comparison.

Returns:

Column: A new column with the indices of the distinct rows from the input table.

pylibcudf.stream_compaction.drop_nans(Table source_table, list keys, size_type keep_threshold) → Table#

Filters out rows from the input table based on the presence of NaNs.

For details, see drop_nans().

Parameters:

source_tableTable: The input table to filter.
keysList[size_type]: The list of column indexes to consider for NaN filtering.
keep_thresholdsize_type: The minimum number of non-NaNs required to keep a row.

Returns:

Table: A new table with rows removed based on NaNs.

pylibcudf.stream_compaction.drop_nulls(Table source_table, list keys, size_type keep_threshold) → Table#

Filters out rows from the input table based on the presence of nulls.

For details, see drop_nulls().

Parameters:

source_tableTable: The input table to filter.
keysList[size_type]: The list of column indexes to consider for null filtering.
keep_thresholdsize_type: The minimum number of non-nulls required to keep a row.

Returns:

Table: A new table with rows removed based on the null count.

pylibcudf.stream_compaction.stable_distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) → Table#

Get the distinct rows from the input table, preserving input order.

For details, see stable_distinct().

Parameters:

inputTable: The input table to filter.
keyslist: The list of column indexes to consider for distinct filtering.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.
nans_equalnan_equality: The option to specify how NaNs are handled in the comparison.

Returns:

Table: A new table with distinct rows from the input table, preserving the input table order.

pylibcudf.stream_compaction.unique(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal) → Table#

Filter duplicate consecutive rows from the input table.

For details, see unique().

Parameters:

inputTable: The input table to filter
keyslist[int]: The list of column indexes to consider for filtering.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.

Returns:

Table: New Table with unique rows from each sequence of equivalent rows as specified by keep. In the same order as the input table.

Notes

If the input columns to be filtered on are sorted, then unique can produce the same result as stable_distinct, but faster.

pylibcudf.stream_compaction.unique_count(Column source, null_policy null_handling, nan_policy nan_handling) → size_type#

Returns the number of unique consecutive elements in the input column.

For details, see unique_count().

Parameters:

sourceColumn: The input column to count the unique elements of.
null_handlingnull_policy: Flag to include or exclude nulls from the count.
nan_handlingnan_policy: Flag to include or exclude NaNs from the count.

Returns:

size_type: The number of unique consecutive elements in the input column.

Notes

If the input column is sorted, then unique_count can produce the same result as distinct_count, but faster.