Files | |
file | stream_compaction.hpp |
Column APIs for filtering rows. | |
Enumerations | |
enum | cudf::duplicate_keep_option { cudf::duplicate_keep_option::KEEP_FIRST = 0, cudf::duplicate_keep_option::KEEP_LAST, cudf::duplicate_keep_option::KEEP_NONE } |
Choices for drop_duplicates API for retainment of duplicate rows. More... | |
Functions | |
std::unique_ptr< table > | cudf::drop_nulls (table_view const &input, std::vector< size_type > const &keys, cudf::size_type keep_threshold, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Filters a table to remove null elements with threshold count. More... | |
std::unique_ptr< table > | cudf::drop_nulls (table_view const &input, std::vector< size_type > const &keys, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Filters a table to remove null elements. More... | |
std::unique_ptr< table > | cudf::drop_nans (table_view const &input, std::vector< size_type > const &keys, cudf::size_type keep_threshold, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Filters a table to remove NANs with threshold count. More... | |
std::unique_ptr< table > | cudf::drop_nans (table_view const &input, std::vector< size_type > const &keys, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Filters a table to remove NANs. More... | |
std::unique_ptr< table > | cudf::apply_boolean_mask (table_view const &input, column_view const &boolean_mask, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Filters input using boolean_mask of boolean values as a mask. More... | |
std::unique_ptr< table > | cudf::drop_duplicates (table_view const &input, std::vector< size_type > const &keys, duplicate_keep_option keep, null_equality nulls_equal=null_equality::EQUAL, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Create a new table without duplicate rows. More... | |
cudf::size_type | cudf::distinct_count (column_view const &input, null_policy null_handling, nan_policy nan_handling) |
Count the unique elements in the column_view. More... | |
cudf::size_type | cudf::distinct_count (table_view const &input, null_equality nulls_equal=null_equality::EQUAL) |
Count the unique rows in a table. More... | |
|
strong |
Choices for drop_duplicates API for retainment of duplicate rows.
Enumerator | |
---|---|
KEEP_FIRST | Keeps first duplicate row and unique rows. |
KEEP_LAST | Keeps last duplicate row and unique rows. |
KEEP_NONE | Keeps only unique rows are kept. |
Definition at line 210 of file stream_compaction.hpp.
std::unique_ptr<table> cudf::apply_boolean_mask | ( | table_view const & | input, |
column_view const & | boolean_mask, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Filters input
using boolean_mask
of boolean values as a mask.
Given an input table_view
and a mask column_view
, an element i
from each column_view of the input
is copied to the corresponding output column if the corresponding element i
in the mask is non-null and true
. This operation is stable: the input order is preserved.
input.num_rows()
is zero, there is no error, and an empty table is returned.cudf::logic_error | if The input size and boolean_mask size mismatches. |
cudf::logic_error | if boolean_mask is not type_id::BOOL8 type. |
[in] | input | The input table_view to filter |
[in] | boolean_mask | A nullable column_view of type type_id::BOOL8 used as a mask to filter the input . |
[in] | mr | Device memory resource used to allocate the returned table's device memory |
input
passing the filter defined by boolean_mask
. cudf::size_type cudf::distinct_count | ( | column_view const & | input, |
null_policy | null_handling, | ||
nan_policy | nan_handling | ||
) |
Count the unique elements in the column_view.
Given an input column_view, number of unique elements in this column_view is returned
If null_handling
is null_policy::EXCLUDE and nan_handling
is nan_policy::NAN_IS_NULL, both NaN
and null
values are ignored. If null_handling
is null_policy::EXCLUDE and nan_handling
is nan_policy::NAN_IS_VALID, only null
is ignored, NaN
is considered in unique count.
[in] | input | The column_view whose unique elements will be counted. |
[in] | null_handling | flag to include or ignore null while counting |
[in] | nan_handling | flag to consider NaN==null or not. |
cudf::size_type cudf::distinct_count | ( | table_view const & | input, |
null_equality | nulls_equal = null_equality::EQUAL |
||
) |
Count the unique rows in a table.
[in] | input | Table whose unique rows will be counted. |
[in] | nulls_equal | flag to denote if null elements should be considered equal nulls are not equal if null_equality::UNEQUAL |
std::unique_ptr<table> cudf::drop_duplicates | ( | table_view const & | input, |
std::vector< size_type > const & | keys, | ||
duplicate_keep_option | keep, | ||
null_equality | nulls_equal = null_equality::EQUAL , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Create a new table without duplicate rows.
Given an input
table_view, each row is copied to output table if the corresponding row of keys
columns is unique, where the definition of unique depends on the value of keep:
cudf::logic_error | if The input row size mismatches with keys . |
[in] | input | input table_view to copy only unique rows |
[in] | keys | vector of indices representing key columns from input |
[in] | keep | keep first entry, last entry, or no entries if duplicates found |
[in] | nulls_equal | flag to denote nulls are equal if null_equality::EQUAL, nulls are not equal if null_equality::UNEQUAL |
[in] | mr | Device memory resource used to allocate the returned table's device memory |
keep
. std::unique_ptr<table> cudf::drop_nans | ( | table_view const & | input, |
std::vector< size_type > const & | keys, | ||
cudf::size_type | keep_threshold, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Filters a table to remove NANs with threshold count.
Filters the rows of the input
considering specified columns indicated in keys
for NANs. These key columns must be of floating-point type.
Given an input table_view, row i
from the input columns is copied to the output if the same row i
of keys
has at least keep_threshold
non-NAN elements.
This operation is stable: the input order is preserved in the output.
input.num_rows()
is zero, or keys
is empty, there is no error, and an empty table
is returnedcudf::logic_error | if The keys columns are not floating-point type. |
[in] | input | The input table_view to filter. |
[in] | keys | vector of indices representing key columns from input |
[in] | keep_threshold | The minimum number of non-NAN elements in a row required to keep the row. |
[in] | mr | Device memory resource used to allocate the returned table's device memory |
input
with at least keep_threshold
non-NAN elements in keys
. std::unique_ptr<table> cudf::drop_nans | ( | table_view const & | input, |
std::vector< size_type > const & | keys, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Filters a table to remove NANs.
Filters the rows of the input
considering specified columns indicated in keys
for NANs. These key columns must be of floating-point type.
Same as drop_nans but defaults keep_threshold to the number of columns in keys
.
[in] | input | The input table_view to filter. |
[in] | keys | vector of indices representing key columns from input |
[in] | mr | Device memory resource used to allocate the returned table's device memory |
input
without NANs in the columns of keys
. std::unique_ptr<table> cudf::drop_nulls | ( | table_view const & | input, |
std::vector< size_type > const & | keys, | ||
cudf::size_type | keep_threshold, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Filters a table to remove null elements with threshold count.
Filters the rows of the input
considering specified columns indicated in keys
for validity / null values.
Given an input table_view, row i
from the input columns is copied to the output if the same row i
of keys
has at least keep_threshold
non-null fields.
This operation is stable: the input order is preserved in the output.
Any non-nullable column in the input is treated as all non-null.
input.num_rows()
is zero, or keys
is empty or has no nulls, there is no error, and an empty table
is returned[in] | input | The input table_view to filter. |
[in] | keys | vector of indices representing key columns from input |
[in] | keep_threshold | The minimum number of non-null fields in a row required to keep the row. |
[in] | mr | Device memory resource used to allocate the returned table's device memory |
input
with at least keep_threshold
non-null fields in keys
. std::unique_ptr<table> cudf::drop_nulls | ( | table_view const & | input, |
std::vector< size_type > const & | keys, | ||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Filters a table to remove null elements.
Filters the rows of the input
considering specified columns indicated in keys
for validity / null values.
Same as drop_nulls but defaults keep_threshold to the number of columns in keys
.
[in] | input | The input table_view to filter. |
[in] | keys | vector of indices representing key columns from input |
[in] | mr | Device memory resource used to allocate the returned table's device memory |
input
without nulls in the columns of keys
.