Set Operations#
- group set_operations
Functions
-
std::unique_ptr<column> have_overlap(lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Check if lists at each row of the given lists columns overlap.
Given two input lists columns, each list row in one column is checked if it has any common elements with the corresponding row of the other column.
A null input row in any of the input lists columns will result in a null output row.
Example:
lhs = { {0, 1, 2}, {1, 2, 3}, null, {4, null, 5} } rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} } result = { true, false, null, true }
- Throws:
cudf::logic_error – if the input lists columns have different sizes.
cudf::logic_error – if children of the input lists columns have different data types.
- Parameters:
lhs – The input lists column for one side
rhs – The input lists column for the other side
nulls_equal – Flag to specify whether null elements should be considered as equal, default to be
UNEQUAL
which means only non-null elements are checked for overlappingnans_equal – Flag to specify whether floating-point NaNs should be considered as equal
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned object
- Returns:
A column of type BOOL containing the check results
-
std::unique_ptr<column> intersect_distinct(lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Create a lists column of distinct elements common to two input lists columns.
Given two input lists columns
lhs
andrhs
, an output lists column is created in a way such that each of its rowi
contains a list of distinct elements that can be found in bothlhs[i]
andrhs[i]
.The order of distinct elements in the output rows is unspecified.
A null input row in any of the input lists columns will result in a null output row.
Example:
lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} } rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} } result = { {1, 2}, {}, null, {null} }
- Throws:
cudf::logic_error – if the input lists columns have different sizes.
cudf::logic_error – if children of the input lists columns have different data types.
- Parameters:
lhs – The input lists column for one side
rhs – The input lists column for the other side
nulls_equal – Flag to specify whether null elements should be considered as equal
nans_equal – Flag to specify whether floating-point NaNs should be considered as equal
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned object
- Returns:
A lists column containing the intersection results
-
std::unique_ptr<column> union_distinct(lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Create a lists column of distinct elements found in either of two input lists columns.
Given two input lists columns
lhs
andrhs
, an output lists column is created in a way such that each of its rowi
contains a list of distinct elements that can be found in eitherlhs[i]
orrhs[i]
.The order of distinct elements in the output rows is unspecified.
A null input row in any of the input lists columns will result in a null output row.
Example:
lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} } rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} } result = { {1, 2, 3}, {1, 2, 3, 4, 5}, null, {4, null, 5} }
- Throws:
cudf::logic_error – if the input lists columns have different sizes.
cudf::logic_error – if children of the input lists columns have different data types.
- Parameters:
lhs – The input lists column for one side
rhs – The input lists column for the other side
nulls_equal – Flag to specify whether null elements should be considered as equal
nans_equal – Flag to specify whether floating-point NaNs should be considered as equal
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned object
- Returns:
A lists column containing the union results
-
std::unique_ptr<column> difference_distinct(lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#
Create a lists column of distinct elements found only in the left input column.
Given two input lists columns
lhs
andrhs
, an output lists column is created in a way such that each of its rowi
contains a list of distinct elements that can be found inlhs[i]
but are not found inrhs[i]
.The order of distinct elements in the output rows is unspecified.
A null input row in any of the input lists columns will result in a null output row.
Example:
lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} } rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} } result = { {}, {1, 2, 3}, null, {4, 5} }
- Throws:
cudf::logic_error – if the input lists columns have different sizes.
cudf::logic_error – if children of the input lists columns have different data types.
- Parameters:
lhs – The input lists column of elements that may be included
rhs – The input lists column of elements to exclude
nulls_equal – Flag to specify whether null elements should be considered as equal
nans_equal – Flag to specify whether floating-point NaNs should be considered as equal
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned object
- Returns:
A lists column containing the difference results
-
std::unique_ptr<column> have_overlap(lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal = null_equality::EQUAL, nan_equality nans_equal = nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource())#