Lists Filtering#
- group lists_filtering
Functions
-
std::unique_ptr<column> apply_boolean_mask(lists_column_view const &input, lists_column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Filters elements in each row of
input
LIST column usingboolean_mask
LIST of booleans as a mask.Given an input
LIST
column and a list-of-bools column, the function produces a newLIST
column of the same type asinput
, where each element is copied from the input row only if the correspondingboolean_mask
is non-null andtrue
.E.g.
input = { {0,1,2}, {3,4}, {5,6,7}, {8,9} }; boolean_mask = { {0,1,1}, {1,0}, {1,1,1}, {0,0} }; results = { {1,2}, {3}, {5,6,7}, {} };
input
andboolean_mask
must have the same number of rows. The output column has the same number of rows as the input column. An element is copied to an output row only if the corresponding boolean_mask element istrue
. An output row is invalid only if the input row is invalid.- Throws:
cudf::logic_error – if
boolean_mask
is not a “lists of bools” columncudf::logic_error – if
input
andboolean_mask
have different number of rows
- Parameters:
input – The input list column view to be filtered
boolean_mask – A nullable list of bools column used to filter
input
elementsstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
List column of the same type as
input
, containing filtered list rows
-
std::unique_ptr<column> distinct(lists_column_view const &input, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#
Create a new list column without duplicate elements in each list.
Given a lists column
input
, distinct elements of each list are copied to the corresponding output list. The order of lists is preserved while the order of elements within each list is not guaranteed.Example:
input = { {0, 1, 2, 3, 2}, {3, 1, 2}, null, {4, null, null, 5} } result = { {0, 1, 2, 3}, {3, 1, 2}, null, {4, null, 5} }
- Parameters:
input – The input lists column
nulls_equal – Flag to specify whether null elements should be considered as equal
nans_equal – Flag to specify whether floating-point NaNs should be considered as equal
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned object
- Returns:
The resulting lists column containing lists without duplicates
-
std::unique_ptr<column> apply_boolean_mask(lists_column_view const &input, lists_column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#