stream_compaction#

pylibcudf.stream_compaction.DuplicateKeepOption#

See also duplicate_keep_option.

Enum members

  • KEEP_ANY

  • KEEP_FIRST

  • KEEP_LAST

  • KEEP_NONE

pylibcudf.stream_compaction.apply_boolean_mask(Table source_table, Column boolean_mask, Stream stream=None, DeviceMemoryResource mr=None) Table#

Filters out rows from the input table based on a boolean mask.

For details, see apply_boolean_mask().

Parameters:
source_tableTable

The input table to filter.

boolean_maskColumn

The boolean mask to apply to the input table.

Returns:
Table

A new table with rows removed based on the boolean mask.

pylibcudf.stream_compaction.distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal, Stream stream=None, DeviceMemoryResource mr=None) Table#

Get the distinct rows from the input table.

For details, see distinct().

Parameters:
inputTable

The input table to filter.

keyslist

The list of column indexes to consider for distinct filtering.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

nans_equalnan_equality

The option to specify how NaNs are handled in the comparison.

Returns:
Table

A new table with distinct rows from the input table. The output will not necessarily be in the same order as the input.

pylibcudf.stream_compaction.distinct_indices(Table input, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal, Stream stream=None, DeviceMemoryResource mr=None) Column#

Get the indices of the distinct rows from the input table.

For details, see distinct_indices().

Parameters:
inputTable

The input table to filter.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

nans_equalnan_equality

The option to specify how NaNs are handled in the comparison.

Returns:
Column

A new column with the indices of the distinct rows from the input table.

pylibcudf.stream_compaction.drop_nans(Table source_table, list keys, size_type keep_threshold, Stream stream=None, DeviceMemoryResource mr=None) Table#

Filters out rows from the input table based on the presence of NaNs.

For details, see drop_nans().

Parameters:
source_tableTable

The input table to filter.

keysList[size_type]

The list of column indexes to consider for NaN filtering.

keep_thresholdsize_type

The minimum number of non-NaNs required to keep a row.

Returns:
Table

A new table with rows removed based on NaNs.

pylibcudf.stream_compaction.drop_nulls(Table source_table, list keys, size_type keep_threshold, Stream stream=None, DeviceMemoryResource mr=None) Table#

Filters out rows from the input table based on the presence of nulls.

For details, see drop_nulls().

Parameters:
source_tableTable

The input table to filter.

keysList[size_type]

The list of column indexes to consider for null filtering.

keep_thresholdsize_type

The minimum number of non-nulls required to keep a row.

Returns:
Table

A new table with rows removed based on the null count.

pylibcudf.stream_compaction.filter(Table predicate_table, Expression predicate_expr, Table filter_table, Stream stream=None, DeviceMemoryResource mr=None) Table#

Filters a table using a predicate expression.

For details, see filter().

Parameters:
predicate_tableTable

The table used for predicate expression evaluation.

predicate_exprExpression

The predicate filter expression.

filter_tableTable

The table to be filtered.

Returns:
Table

The filtered table.

pylibcudf.stream_compaction.stable_distinct(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, nan_equality nans_equal, Stream stream=None, DeviceMemoryResource mr=None) Table#

Get the distinct rows from the input table, preserving input order.

For details, see stable_distinct().

Parameters:
inputTable

The input table to filter.

keyslist

The list of column indexes to consider for distinct filtering.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

nans_equalnan_equality

The option to specify how NaNs are handled in the comparison.

Returns:
Table

A new table with distinct rows from the input table, preserving the input table order.

pylibcudf.stream_compaction.unique(Table input, list keys, duplicate_keep_option keep, null_equality nulls_equal, Stream stream=None, DeviceMemoryResource mr=None) Table#

Filter duplicate consecutive rows from the input table.

For details, see unique().

Parameters:
inputTable

The input table to filter

keyslist[int]

The list of column indexes to consider for filtering.

keepduplicate_keep_option

The option to specify which rows to keep in the case of duplicates.

nulls_equalnull_equality

The option to specify how nulls are handled in the comparison.

Returns:
Table

New Table with unique rows from each sequence of equivalent rows as specified by keep. In the same order as the input table.

Notes

If the input columns to be filtered on are sorted, then unique can produce the same result as stable_distinct, but faster.