Files | Functions
Set Operations

Files

file  set_operations.hpp
 

Functions

std::unique_ptr< columncudf::lists::have_overlap (lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal=null_equality::EQUAL, nan_equality nans_equal=nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Check if lists at each row of the given lists columns overlap. More...
 
std::unique_ptr< columncudf::lists::intersect_distinct (lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal=null_equality::EQUAL, nan_equality nans_equal=nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create a lists column of distinct elements common to two input lists columns. More...
 
std::unique_ptr< columncudf::lists::union_distinct (lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal=null_equality::EQUAL, nan_equality nans_equal=nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create a lists column of distinct elements found in either of two input lists columns. More...
 
std::unique_ptr< columncudf::lists::difference_distinct (lists_column_view const &lhs, lists_column_view const &rhs, null_equality nulls_equal=null_equality::EQUAL, nan_equality nans_equal=nan_equality::ALL_EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=rmm::mr::get_current_device_resource())
 Create a lists column of distinct elements found only in the left input column. More...
 

Detailed Description

Function Documentation

◆ difference_distinct()

std::unique_ptr<column> cudf::lists::difference_distinct ( lists_column_view const &  lhs,
lists_column_view const &  rhs,
null_equality  nulls_equal = null_equality::EQUAL,
nan_equality  nans_equal = nan_equality::ALL_EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create a lists column of distinct elements found only in the left input column.

Given two input lists columns lhs and rhs, an output lists column is created in a way such that each of its row i contains a list of distinct elements that can be found in lhs[i] but are not found in rhs[i].

The order of distinct elements in the output rows is unspecified.

A null input row in any of the input lists columns will result in a null output row.

Exceptions
cudf::logic_errorif the input lists columns have different sizes.
cudf::logic_errorif children of the input lists columns have different data types.

Example:

lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
result = { {}, {1, 2, 3}, null, {4, 5} }
Parameters
lhsThe input lists column of elements that may be included
rhsThe input lists column of elements to exclude
nulls_equalFlag to specify whether null elements should be considered as equal
nans_equalFlag to specify whether floating-point NaNs should be considered as equal
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned object
Returns
A lists column containing the difference results

◆ have_overlap()

std::unique_ptr<column> cudf::lists::have_overlap ( lists_column_view const &  lhs,
lists_column_view const &  rhs,
null_equality  nulls_equal = null_equality::EQUAL,
nan_equality  nans_equal = nan_equality::ALL_EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Check if lists at each row of the given lists columns overlap.

Given two input lists columns, each list row in one column is checked if it has any common elements with the corresponding row of the other column.

A null input row in any of the input lists columns will result in a null output row.

Exceptions
cudf::logic_errorif the input lists columns have different sizes.
cudf::logic_errorif children of the input lists columns have different data types.

Example:

lhs = { {0, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
result = { true, false, null, true }
Parameters
lhsThe input lists column for one side
rhsThe input lists column for the other side
nulls_equalFlag to specify whether null elements should be considered as equal, default to be UNEQUAL which means only non-null elements are checked for overlapping
nans_equalFlag to specify whether floating-point NaNs should be considered as equal
mrDevice memory resource used to allocate the returned object
streamCUDA stream used for device memory operations and kernel launches
Returns
A column of type BOOL containing the check results

◆ intersect_distinct()

std::unique_ptr<column> cudf::lists::intersect_distinct ( lists_column_view const &  lhs,
lists_column_view const &  rhs,
null_equality  nulls_equal = null_equality::EQUAL,
nan_equality  nans_equal = nan_equality::ALL_EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create a lists column of distinct elements common to two input lists columns.

Given two input lists columns lhs and rhs, an output lists column is created in a way such that each of its row i contains a list of distinct elements that can be found in both lhs[i] and rhs[i].

The order of distinct elements in the output rows is unspecified.

A null input row in any of the input lists columns will result in a null output row.

Exceptions
cudf::logic_errorif the input lists columns have different sizes.
cudf::logic_errorif children of the input lists columns have different data types.

Example:

lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
result = { {1, 2}, {}, null, {null} }
Parameters
lhsThe input lists column for one side
rhsThe input lists column for the other side
nulls_equalFlag to specify whether null elements should be considered as equal
nans_equalFlag to specify whether floating-point NaNs should be considered as equal
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned object
Returns
A lists column containing the intersection results

◆ union_distinct()

std::unique_ptr<column> cudf::lists::union_distinct ( lists_column_view const &  lhs,
lists_column_view const &  rhs,
null_equality  nulls_equal = null_equality::EQUAL,
nan_equality  nans_equal = nan_equality::ALL_EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = rmm::mr::get_current_device_resource() 
)

Create a lists column of distinct elements found in either of two input lists columns.

Given two input lists columns lhs and rhs, an output lists column is created in a way such that each of its row i contains a list of distinct elements that can be found in either lhs[i] or rhs[i].

The order of distinct elements in the output rows is unspecified.

A null input row in any of the input lists columns will result in a null output row.

Exceptions
cudf::logic_errorif the input lists columns have different sizes.
cudf::logic_errorif children of the input lists columns have different data types.

Example:

lhs = { {2, 1, 2}, {1, 2, 3}, null, {4, null, 5} }
rhs = { {1, 2, 3}, {4, 5}, {null, 7, 8}, {null, null} }
result = { {1, 2, 3}, {1, 2, 3, 4, 5}, null, {4, null, 5} }
Parameters
lhsThe input lists column for one side
rhsThe input lists column for the other side
nulls_equalFlag to specify whether null elements should be considered as equal
nans_equalFlag to specify whether floating-point NaNs should be considered as equal
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned object
Returns
A lists column containing the union results