Lists Filtering#
- group Filtering
Functions
-
std::unique_ptr<column> apply_boolean_mask(lists_column_view const &input, lists_column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Filters elements in each row of
inputLIST column usingboolean_maskLIST of booleans as a mask.Given an input
LISTcolumn and a list-of-bools column, the function produces a newLISTcolumn of the same type asinput, where each element is copied from the input row only if the correspondingboolean_maskis non-null andtrue.E.g.
input = { {0,1,2}, {3,4}, {5,6,7}, {8,9} }; boolean_mask = { {0,1,1}, {1,0}, {1,1,1}, {0,0} }; results = { {1,2}, {3}, {5,6,7}, {} };inputandboolean_maskmust have the same number of rows. The output column has the same number of rows as the input column. An element is copied to an output row only if the corresponding boolean_mask element istrue. An output row is invalid only if the input row is invalid.- Throws:
cudf::logic_error – if
boolean_maskis not a “lists of bools” columncudf::logic_error – if
inputandboolean_maskhave different number of rows
- Parameters:
input – The input list column view to be filtered
boolean_mask – A nullable list of bools column used to filter
inputelementsstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
List column of the same type as
input, containing filtered list rows
-
std::unique_ptr<column> distinct(lists_column_view const &input, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, duplicate_keep_option keep_option = duplicate_keep_option::KEEP_ANY, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Create a new list column without duplicate elements in each list.
Given a lists column
input, distinct elements of each list are copied to the corresponding output list. The order of lists is preserved while the order of elements within each list is not guaranteed.Example:
input = { {0, 1, 2, 3, 2}, {3, 1, 2}, null, {4, null, null, 5} } result = { {0, 1, 2, 3}, {3, 1, 2}, null, {4, null, 5} }- Parameters:
input – The input lists column
nulls_equal – Flag to specify whether null elements should be considered as equal
nans_equal – Flag to specify whether floating-point NaNs should be considered as equal
keep_option – Flag to specify which element to keep (first, last, any)
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned object
- Returns:
The resulting lists column containing lists without duplicates
-
std::unique_ptr<column> apply_boolean_mask(lists_column_view const &input, lists_column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#