stream_compaction#
- pylibcudf.stream_compaction.DuplicateKeepOption#
See also
cudf::duplicate_keep_option
.Enum members
KEEP_ANY
KEEP_FIRST
KEEP_LAST
KEEP_NONE
- pylibcudf.stream_compaction.apply_boolean_mask(Table source_table, Column boolean_mask) Table #
Filters out rows from the input table based on a boolean mask.
For details, see
apply_boolean_mask()
.- Parameters:
- source_tableTable
The input table to filter.
- boolean_maskColumn
The boolean mask to apply to the input table.
- Returns:
- Table
A new table with rows removed based on the boolean mask.
- pylibcudf.stream_compaction.distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) Table #
Get the distinct rows from the input table.
For details, see
distinct()
.- Parameters:
- inputTable
The input table to filter.
- keyslist
The list of column indexes to consider for distinct filtering.
- keepduplicate_keep_option
The option to specify which rows to keep in the case of duplicates.
- nulls_equalnull_equality
The option to specify how nulls are handled in the comparison.
- nans_equalnan_equality
The option to specify how NaNs are handled in the comparison.
- Returns:
- Table
A new table with distinct rows from the input table. The output will not necessarily be in the same order as the input.
- pylibcudf.stream_compaction.distinct_count(Column source, null_policy null_handling, nan_policy nan_handling) size_type #
Returns the number of distinct elements in the input column.
For details, see
distinct_count()
.- Parameters:
- sourceColumn
The input column to count the unique elements of.
- null_handlingnull_policy
Flag to include or exclude nulls from the count.
- nan_handlingnan_policy
Flag to include or exclude NaNs from the count.
- Returns:
- size_type
The number of distinct elements in the input column.
- pylibcudf.stream_compaction.distinct_indices(Table input, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) Column #
Get the indices of the distinct rows from the input table.
For details, see
distinct_indices()
.- Parameters:
- inputTable
The input table to filter.
- keepduplicate_keep_option
The option to specify which rows to keep in the case of duplicates.
- nulls_equalnull_equality
The option to specify how nulls are handled in the comparison.
- nans_equalnan_equality
The option to specify how NaNs are handled in the comparison.
- Returns:
- Column
A new column with the indices of the distinct rows from the input table.
- pylibcudf.stream_compaction.drop_nans(Table source_table, list keys, size_type keep_threshold) Table #
Filters out rows from the input table based on the presence of NaNs.
For details, see
drop_nans()
.- Parameters:
- source_tableTable
The input table to filter.
- keysList[size_type]
The list of column indexes to consider for NaN filtering.
- keep_thresholdsize_type
The minimum number of non-NaNs required to keep a row.
- Returns:
- Table
A new table with rows removed based on NaNs.
- pylibcudf.stream_compaction.drop_nulls(Table source_table, list keys, size_type keep_threshold) Table #
Filters out rows from the input table based on the presence of nulls.
For details, see
drop_nulls()
.- Parameters:
- source_tableTable
The input table to filter.
- keysList[size_type]
The list of column indexes to consider for null filtering.
- keep_thresholdsize_type
The minimum number of non-nulls required to keep a row.
- Returns:
- Table
A new table with rows removed based on the null count.
- pylibcudf.stream_compaction.stable_distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal) Table #
Get the distinct rows from the input table, preserving input order.
For details, see
stable_distinct()
.- Parameters:
- inputTable
The input table to filter.
- keyslist
The list of column indexes to consider for distinct filtering.
- keepduplicate_keep_option
The option to specify which rows to keep in the case of duplicates.
- nulls_equalnull_equality
The option to specify how nulls are handled in the comparison.
- nans_equalnan_equality
The option to specify how NaNs are handled in the comparison.
- Returns:
- Table
A new table with distinct rows from the input table, preserving the input table order.
- pylibcudf.stream_compaction.unique(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal) Table #
Filter duplicate consecutive rows from the input table.
For details, see
unique()
.- Parameters:
- inputTable
The input table to filter
- keyslist[int]
The list of column indexes to consider for filtering.
- keepduplicate_keep_option
The option to specify which rows to keep in the case of duplicates.
- nulls_equalnull_equality
The option to specify how nulls are handled in the comparison.
- Returns:
- Table
New Table with unique rows from each sequence of equivalent rows as specified by keep. In the same order as the input table.
Notes
If the input columns to be filtered on are sorted, then unique can produce the same result as stable_distinct, but faster.
- pylibcudf.stream_compaction.unique_count(Column source, null_policy null_handling, nan_policy nan_handling) size_type #
Returns the number of unique consecutive elements in the input column.
For details, see
unique_count()
.- Parameters:
- sourceColumn
The input column to count the unique elements of.
- null_handlingnull_policy
Flag to include or exclude nulls from the count.
- nan_handlingnan_policy
Flag to include or exclude NaNs from the count.
- Returns:
- size_type
The number of unique consecutive elements in the input column.
Notes
If the input column is sorted, then unique_count can produce the same result as distinct_count, but faster.