Files | Enumerations | Functions
Stream Compaction

Files

file  stream_compaction.hpp
 Column APIs for filtering rows.
 

Enumerations

enum  cudf::duplicate_keep_option { cudf::duplicate_keep_option::KEEP_FIRST = 0, cudf::duplicate_keep_option::KEEP_LAST, cudf::duplicate_keep_option::KEEP_NONE }
 Choices for drop_duplicates API for retainment of duplicate rows. More...
 

Functions

std::unique_ptr< tablecudf::drop_nulls (table_view const &input, std::vector< size_type > const &keys, cudf::size_type keep_threshold, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filters a table to remove null elements with threshold count. More...
 
std::unique_ptr< tablecudf::drop_nulls (table_view const &input, std::vector< size_type > const &keys, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filters a table to remove null elements. More...
 
std::unique_ptr< tablecudf::drop_nans (table_view const &input, std::vector< size_type > const &keys, cudf::size_type keep_threshold, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filters a table to remove NANs with threshold count. More...
 
std::unique_ptr< tablecudf::drop_nans (table_view const &input, std::vector< size_type > const &keys, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filters a table to remove NANs. More...
 
std::unique_ptr< tablecudf::apply_boolean_mask (table_view const &input, column_view const &boolean_mask, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filters input using boolean_mask of boolean values as a mask. More...
 
std::unique_ptr< tablecudf::drop_duplicates (table_view const &input, std::vector< size_type > const &keys, duplicate_keep_option keep, null_equality nulls_equal=null_equality::EQUAL, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Create a new table without duplicate rows. More...
 
cudf::size_type cudf::distinct_count (column_view const &input, null_policy null_handling, nan_policy nan_handling)
 Count the unique elements in the column_view. More...
 
cudf::size_type cudf::distinct_count (table_view const &input, null_equality nulls_equal=null_equality::EQUAL)
 Count the unique rows in a table. More...
 

Detailed Description

Enumeration Type Documentation

◆ duplicate_keep_option

Choices for drop_duplicates API for retainment of duplicate rows.

Enumerator
KEEP_FIRST 

Keeps first duplicate row and unique rows.

KEEP_LAST 

Keeps last duplicate row and unique rows.

KEEP_NONE 

Keeps only unique rows are kept.

Definition at line 210 of file stream_compaction.hpp.

Function Documentation

◆ apply_boolean_mask()

std::unique_ptr<table> cudf::apply_boolean_mask ( table_view const &  input,
column_view const &  boolean_mask,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filters input using boolean_mask of boolean values as a mask.

Given an input table_view and a mask column_view, an element i from each column_view of the input is copied to the corresponding output column if the corresponding element i in the mask is non-null and true. This operation is stable: the input order is preserved.

Note
if input.num_rows() is zero, there is no error, and an empty table is returned.
Exceptions
cudf::logic_errorif The input size and boolean_mask size mismatches.
cudf::logic_errorif boolean_mask is not type_id::BOOL8 type.
Parameters
[in]inputThe input table_view to filter
[in]boolean_maskA nullable column_view of type type_id::BOOL8 used as a mask to filter the input.
[in]mrDevice memory resource used to allocate the returned table's device memory
Returns
Table containing copy of all rows of input passing the filter defined by boolean_mask.

◆ distinct_count() [1/2]

cudf::size_type cudf::distinct_count ( column_view const &  input,
null_policy  null_handling,
nan_policy  nan_handling 
)

Count the unique elements in the column_view.

Given an input column_view, number of unique elements in this column_view is returned

If null_handling is null_policy::EXCLUDE and nan_handling is nan_policy::NAN_IS_NULL, both NaN and null values are ignored. If null_handling is null_policy::EXCLUDE and nan_handling is nan_policy::NAN_IS_VALID, only null is ignored, NaN is considered in unique count.

Parameters
[in]inputThe column_view whose unique elements will be counted.
[in]null_handlingflag to include or ignore null while counting
[in]nan_handlingflag to consider NaN==null or not.
Returns
number of unique elements

◆ distinct_count() [2/2]

cudf::size_type cudf::distinct_count ( table_view const &  input,
null_equality  nulls_equal = null_equality::EQUAL 
)

Count the unique rows in a table.

Parameters
[in]inputTable whose unique rows will be counted.
[in]nulls_equalflag to denote if null elements should be considered equal nulls are not equal if null_equality::UNEQUAL
Returns
number of unique rows in the table

◆ drop_duplicates()

std::unique_ptr<table> cudf::drop_duplicates ( table_view const &  input,
std::vector< size_type > const &  keys,
duplicate_keep_option  keep,
null_equality  nulls_equal = null_equality::EQUAL,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Create a new table without duplicate rows.

Given an input table_view, each row is copied to output table if the corresponding row of keys columns is unique, where the definition of unique depends on the value of keep:

  • KEEP_FIRST: only the first of a sequence of duplicate rows is copied
  • KEEP_LAST: only the last of a sequence of duplicate rows is copied
  • KEEP_NONE: no duplicate rows are copied
Exceptions
cudf::logic_errorif The input row size mismatches with keys.
Parameters
[in]inputinput table_view to copy only unique rows
[in]keysvector of indices representing key columns from input
[in]keepkeep first entry, last entry, or no entries if duplicates found
[in]nulls_equalflag to denote nulls are equal if null_equality::EQUAL, nulls are not equal if null_equality::UNEQUAL
[in]mrDevice memory resource used to allocate the returned table's device memory
Returns
Table with unique rows as per specified keep.

◆ drop_nans() [1/2]

std::unique_ptr<table> cudf::drop_nans ( table_view const &  input,
std::vector< size_type > const &  keys,
cudf::size_type  keep_threshold,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filters a table to remove NANs with threshold count.

Filters the rows of the input considering specified columns indicated in keys for NANs. These key columns must be of floating-point type.

Given an input table_view, row i from the input columns is copied to the output if the same row i of keys has at least keep_threshold non-NAN elements.

This operation is stable: the input order is preserved in the output.

input {col1: {1.0, 2.0, 3.0, NAN},
col2: {4.0, null, NAN, NAN},
col3: {7.0, NAN, NAN, NAN}}
keys = {0, 1, 2} // All columns
keep_threshold = 2
output {col1: {1.0, 2.0}
col2: {4.0, null}
col3: {7.0, NAN}}
Note
if input.num_rows() is zero, or keys is empty, there is no error, and an empty table is returned
Exceptions
cudf::logic_errorif The keys columns are not floating-point type.
Parameters
[in]inputThe input table_view to filter.
[in]keysvector of indices representing key columns from input
[in]keep_thresholdThe minimum number of non-NAN elements in a row required to keep the row.
[in]mrDevice memory resource used to allocate the returned table's device memory
Returns
Table containing all rows of the input with at least keep_threshold non-NAN elements in keys.

◆ drop_nans() [2/2]

std::unique_ptr<table> cudf::drop_nans ( table_view const &  input,
std::vector< size_type > const &  keys,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filters a table to remove NANs.

Filters the rows of the input considering specified columns indicated in keys for NANs. These key columns must be of floating-point type.

input {col1: {1.0, 2.0, 3.0, NAN},
col2: {4.0, null, NAN, NAN},
col3: {null, NAN, NAN, NAN}}
keys = {0, 1, 2} // All columns
keep_threshold = 2
output {col1: {1.0}
col2: {4.0}
col3: {null}}

Same as drop_nans but defaults keep_threshold to the number of columns in keys.

Parameters
[in]inputThe input table_view to filter.
[in]keysvector of indices representing key columns from input
[in]mrDevice memory resource used to allocate the returned table's device memory
Returns
Table containing all rows of the input without NANs in the columns of keys.

◆ drop_nulls() [1/2]

std::unique_ptr<table> cudf::drop_nulls ( table_view const &  input,
std::vector< size_type > const &  keys,
cudf::size_type  keep_threshold,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filters a table to remove null elements with threshold count.

Filters the rows of the input considering specified columns indicated in keys for validity / null values.

Given an input table_view, row i from the input columns is copied to the output if the same row i of keys has at least keep_threshold non-null fields.

This operation is stable: the input order is preserved in the output.

Any non-nullable column in the input is treated as all non-null.

input {col1: {1, 2, 3, null},
col2: {4, 5, null, null},
col3: {7, null, null, null}}
keys = {0, 1, 2} // All columns
keep_threshold = 2
output {col1: {1, 2}
col2: {4, 5}
col3: {7, null}}
Note
if input.num_rows() is zero, or keys is empty or has no nulls, there is no error, and an empty table is returned
Parameters
[in]inputThe input table_view to filter.
[in]keysvector of indices representing key columns from input
[in]keep_thresholdThe minimum number of non-null fields in a row required to keep the row.
[in]mrDevice memory resource used to allocate the returned table's device memory
Returns
Table containing all rows of the input with at least keep_threshold non-null fields in keys.

◆ drop_nulls() [2/2]

std::unique_ptr<table> cudf::drop_nulls ( table_view const &  input,
std::vector< size_type > const &  keys,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filters a table to remove null elements.

Filters the rows of the input considering specified columns indicated in keys for validity / null values.

input {col1: {1, 2, 3, null},
col2: {4, 5, null, null},
col3: {7, null, null, null}}
keys = {0, 1, 2} //All columns
output {col1: {1}
col2: {4}
col3: {7}}

Same as drop_nulls but defaults keep_threshold to the number of columns in keys.

Parameters
[in]inputThe input table_view to filter.
[in]keysvector of indices representing key columns from input
[in]mrDevice memory resource used to allocate the returned table's device memory
Returns
Table containing all rows of the input without nulls in the columns of keys.