stream_compaction#

pylibcudf.stream_compaction.DuplicateKeepOption#

See also duplicate_keep_option.

Enum members

KEEP_ANY
KEEP_FIRST
KEEP_LAST
KEEP_NONE

pylibcudf.stream_compaction.apply_boolean_mask(Table source_table, Column boolean_mask, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Filters out rows from the input table based on a boolean mask.

For details, see apply_boolean_mask().

Parameters:

source_tableTable: The input table to filter.
boolean_maskColumn: The boolean mask to apply to the input table.

Returns:

Table: A new table with rows removed based on the boolean mask.

pylibcudf.stream_compaction.distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Get the distinct rows from the input table.

For details, see distinct().

Parameters:

inputTable: The input table to filter.
keyslist: The list of column indexes to consider for distinct filtering.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.
nans_equalnan_equality: The option to specify how NaNs are handled in the comparison.

Returns:

Table: A new table with distinct rows from the input table. The output will not necessarily be in the same order as the input.

pylibcudf.stream_compaction.distinct_indices(Table input, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal, Stream stream=None, DeviceMemoryResource mr=None) → Column#

Get the indices of the distinct rows from the input table.

For details, see distinct_indices().

Parameters:

inputTable: The input table to filter.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.
nans_equalnan_equality: The option to specify how NaNs are handled in the comparison.

Returns:

Column: A new column with the indices of the distinct rows from the input table.

pylibcudf.stream_compaction.drop_nans(Table source_table, list keys, size_type keep_threshold, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Filters out rows from the input table based on the presence of NaNs.

For details, see drop_nans().

Parameters:

source_tableTable: The input table to filter.
keysList[size_type]: The list of column indexes to consider for NaN filtering.
keep_thresholdsize_type: The minimum number of non-NaNs required to keep a row.

Returns:

Table: A new table with rows removed based on NaNs.

pylibcudf.stream_compaction.drop_nulls(Table source_table, list keys, size_type keep_threshold, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Filters out rows from the input table based on the presence of nulls.

For details, see drop_nulls().

Parameters:

source_tableTable: The input table to filter.
keysList[size_type]: The list of column indexes to consider for null filtering.
keep_thresholdsize_type: The minimum number of non-nulls required to keep a row.

Returns:

Table: A new table with rows removed based on the null count.

pylibcudf.stream_compaction.filter(Table predicate_table, Expression predicate_expr, Table filter_table, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Filters a table using a predicate expression.

For details, see filter().

Parameters:

predicate_tableTable: The table used for predicate expression evaluation.
predicate_exprExpression: The predicate filter expression.
filter_tableTable: The table to be filtered.

Returns:

Table: The filtered table.

pylibcudf.stream_compaction.stable_distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Get the distinct rows from the input table, preserving input order.

For details, see stable_distinct().

Parameters:

inputTable: The input table to filter.
keyslist: The list of column indexes to consider for distinct filtering.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.
nans_equalnan_equality: The option to specify how NaNs are handled in the comparison.

Returns:

Table: A new table with distinct rows from the input table, preserving the input table order.

pylibcudf.stream_compaction.unique(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, Stream stream=None, DeviceMemoryResource mr=None) → Table#

Filter duplicate consecutive rows from the input table.

For details, see unique().

Parameters:

inputTable: The input table to filter
keyslist[int]: The list of column indexes to consider for filtering.
keepduplicate_keep_option: The option to specify which rows to keep in the case of duplicates.
nulls_equalnull_equality: The option to specify how nulls are handled in the comparison.

Returns:

Table: New Table with unique rows from each sequence of equivalent rows as specified by keep. In the same order as the input table.

Notes

If the input columns to be filtered on are sorted, then unique can produce the same result as stable_distinct, but faster.

stream_compaction#

This Page