Files | Classes | Typedefs | Enumerations | Functions | Variables

Files

file  conditional_join.hpp
 
file  distinct_hash_join.hpp
 
file  filtered_join.hpp
 
file  hash_join.hpp
 
file  join.hpp
 
file  mixed_join.hpp
 
file  sort_merge_join.hpp
 

Classes

class  cudf::distinct_hash_join
 Distinct hash join that builds hash table in creation and probes results in subsequent *_join member functions. More...
 
class  cudf::filtered_join
 Filtered hash join that builds hash table on creation and probes results in subsequent *_join member functions. More...
 
class  cudf::hash_join
 Hash join that builds hash table in creation and probes results in subsequent *_join member functions. More...
 
struct  cudf::join_match_context
 Holds context information about matches between tables during a join operation. More...
 
struct  cudf::join_partition_context
 Stores context information for partitioned join operations. More...
 
class  cudf::sort_merge_join
 Class that implements sort-merge algorithm for table joins. More...
 

Typedefs

using cudf::output_size_data_type = std::optional< std::pair< std::size_t, device_span< size_type const > >>
 Type alias for output size data used in mixed joins. More...
 

Enumerations

enum class  cudf::set_as_build_table { LEFT , RIGHT }
 Specifies which table to use as the build table in a hash join operation. More...
 
enum class  cudf::nullable_join : bool { YES , NO }
 The enum class to specify if any of the input join tables (build table and any later probe table) has nulls. More...
 
enum class  cudf::join_kind : int32_t {
  cudf::INNER_JOIN = 0 , cudf::LEFT_JOIN = 1 , cudf::FULL_JOIN = 2 , cudf::LEFT_SEMI_JOIN = 3 ,
  cudf::LEFT_ANTI_JOIN = 4
}
 Specifies the type of join operation to perform. More...
 

Functions

std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::conditional_inner_join (table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::conditional_left_join (table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true, or null matches for rows in left that have no match in right. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::conditional_full_join (table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true, or null matches for rows in either table that have no match in the other. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > cudf::conditional_left_semi_join (table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns an index vector corresponding to all rows in the left table for which there exists some row in the right table where the predicate evaluates to true. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > cudf::conditional_left_anti_join (table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns an index vector corresponding to all rows in the left table for which there does not exist any row in the right table where the predicate evaluates to true. More...
 
std::size_t cudf::conditional_inner_join_size (table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the exact number of matches (rows) when performing a conditional inner join between the specified tables where the predicate evaluates to true. More...
 
std::size_t cudf::conditional_left_join_size (table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the exact number of matches (rows) when performing a conditional left join between the specified tables where the predicate evaluates to true. More...
 
std::size_t cudf::conditional_left_semi_join_size (table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the exact number of matches (rows) when performing a conditional left semi join between the specified tables where the predicate evaluates to true. More...
 
std::size_t cudf::conditional_left_anti_join_size (table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the exact number of matches (rows) when performing a conditional left anti join between the specified tables where the predicate evaluates to true. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::inner_join (cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to an inner join between the specified tables. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::left_join (cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to a left join between the specified tables. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::full_join (cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to a full join between the specified tables. More...
 
std::unique_ptr< cudf::tablecudf::cross_join (cudf::table_view const &left, cudf::table_view const &right, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Performs a cross join on two tables (left, right) More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::filter_join_indices (cudf::table_view const &left, cudf::table_view const &right, cudf::device_span< size_type const > left_indices, cudf::device_span< size_type const > right_indices, cudf::ast::expression const &predicate, cudf::join_kind join_kind, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Filters join result indices based on a conditional predicate and join type. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::mixed_inner_join (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, output_size_data_type output_size_data={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::mixed_left_join (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, output_size_data_type output_size_data={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables, or null matches for rows in left that have no match in right. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::mixed_full_join (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, output_size_data_type output_size_data={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables, or null matches for rows in either pair of tables that have no matches in the other pair. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > cudf::mixed_left_semi_join (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns an index vector corresponding to all rows in the left tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > cudf::mixed_left_anti_join (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns an index vector corresponding to all rows in the left tables for which there is no row in the right tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables. More...
 
std::pair< std::size_t, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::mixed_inner_join_size (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the exact number of matches (rows) when performing a mixed inner join between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables. More...
 
std::pair< std::size_t, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::mixed_left_join_size (table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns the exact number of matches (rows) when performing a mixed left join between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::sort_merge_inner_join (cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to an inner join between the specified tables. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > cudf::merge_inner_join (cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Returns a pair of row index vectors corresponding to an inner join between the specified tables. More...
 

Variables

constexpr CUDF_HOST_DEVICE size_type cudf::JoinNoMatch = cuda::std::numeric_limits<size_type>::min()
 Sentinel value used to indicate an unmatched row index in join operations. More...
 

Detailed Description

Typedef Documentation

◆ output_size_data_type

using cudf::output_size_data_type = typedef std::optional<std::pair<std::size_t, device_span<size_type const> >>

Type alias for output size data used in mixed joins.

This type represents an optional pair containing:

  • The exact output size of the join operation
  • A device span of per-row match counts for each row in the larger input table

Definition at line 37 of file mixed_join.hpp.

Enumeration Type Documentation

◆ join_kind

enum cudf::join_kind : int32_t
strong

Specifies the type of join operation to perform.

This enum is used to control the behavior of join operations, particularly in functions like filter_join_indices() that need to apply different logic based on the join semantics.

Enumerator
INNER_JOIN 

Inner join: only matching rows from both tables.

LEFT_JOIN 

Left join: all rows from left table plus matching rows from right.

FULL_JOIN 

Full outer join: all rows from both tables.

LEFT_SEMI_JOIN 

Left semi join: left rows that have matches in right table.

LEFT_ANTI_JOIN 

Left anti join: left rows that have no matches in right table.

Definition at line 37 of file join.hpp.

◆ nullable_join

enum cudf::nullable_join : bool
strong

The enum class to specify if any of the input join tables (build table and any later probe table) has nulls.

This is used upon hash_join object construction to specify the existence of nulls in all the possible input tables. If such null existence is unknown, YES should be used as the default option.

Definition at line 55 of file hash_join.hpp.

◆ set_as_build_table

Specifies which table to use as the build table in a hash join operation.

See also
filtered_join

Definition at line 38 of file filtered_join.hpp.

Function Documentation

◆ conditional_full_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::conditional_full_join ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true, or null matches for rows in either table that have no match in the other.

Taken pairwise, the values from the returned vectors are one of: (1) row indices corresponding to matching rows from the left and right tables, (2) a row index and an unspecified out-of-bounds value, representing a row from one table without a match in the other.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {{0, 1, 2, None}, {None, 0, 1, 2}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a conditional full join between two tables left and right .

◆ conditional_inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::conditional_inner_join ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {{1, 2}, {0, 1}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {{1}, {0}}
Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a conditional inner join between two tables left and right .

◆ conditional_inner_join_size()

std::size_t cudf::conditional_inner_join_size ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the exact number of matches (rows) when performing a conditional inner join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
The size that would result from performing the requested join

◆ conditional_left_anti_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::conditional_left_anti_join ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns an index vector corresponding to all rows in the left table for which there does not exist any row in the right table where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {0}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {0, 2}
Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector left_indices that can be used to construct the result of performing a conditional left anti join between two tables left and right .

◆ conditional_left_anti_join_size()

std::size_t cudf::conditional_left_anti_join_size ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the exact number of matches (rows) when performing a conditional left anti join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
The size that would result from performing the requested join

◆ conditional_left_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::conditional_left_join ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true, or null matches for rows in left that have no match in right.

The first returned vector contains all the row indices from the left table (in unspecified order). The corresponding value in the second returned vector is either (1) the row index of the matched row from the right table, if there is a match or (2) an unspecified out-of-bounds value.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {{0, 1, 2}, {None, 0, 1}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {{0, 1, 2}, {None, 0, None}}
Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a conditional left join between two tables left and right .

◆ conditional_left_join_size()

std::size_t cudf::conditional_left_join_size ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the exact number of matches (rows) when performing a conditional left join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
The size that would result from performing the requested join

◆ conditional_left_semi_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::conditional_left_semi_join ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns an index vector corresponding to all rows in the left table for which there exists some row in the right table where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {1, 2}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {1}
Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector left_indices that can be used to construct the result of performing a conditional left semi join between two tables left and right .

◆ conditional_left_semi_join_size()

std::size_t cudf::conditional_left_semi_join_size ( table_view const &  left,
table_view const &  right,
ast::expression const &  binary_predicate,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the exact number of matches (rows) when performing a conditional left semi join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Exceptions
cudf::data_type_errorif the binary predicate outputs a non-boolean result.
Parameters
leftThe left table
rightThe right table
binary_predicateThe condition on which to join
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
The size that would result from performing the requested join

◆ cross_join()

std::unique_ptr<cudf::table> cudf::cross_join ( cudf::table_view const &  left,
cudf::table_view const &  right,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Performs a cross join on two tables (left, right)

The cross join returns the cartesian product of rows from each table.

Note
Warning: This function can easily cause out-of-memory errors. The size of the output is equal to left.num_rows() * right.num_rows(). Use with caution.
Left a: {0, 1, 2}
Right b: {3, 4, 5}
Result: { a: {0, 0, 0, 1, 1, 1, 2, 2, 2}, b: {3, 4, 5, 3, 4, 5, 3, 4, 5} }
Exceptions
cudf::logic_errorif the number of columns in either left or right table is 0
Parameters
leftThe left table
rightThe right table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table's device memory
Returns
Result of cross joining left and right tables

◆ filter_join_indices()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::filter_join_indices ( cudf::table_view const &  left,
cudf::table_view const &  right,
cudf::device_span< size_type const >  left_indices,
cudf::device_span< size_type const >  right_indices,
cudf::ast::expression const &  predicate,
cudf::join_kind  join_kind,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Filters join result indices based on a conditional predicate and join type.

This function takes the result indices from a hash/sort join operation and applies a conditional predicate to filter the pairs. It enables implementing mixed joins as a two-step process: equality-based join followed by conditional filtering.

The behavior depends on the join type:

  • INNER_JOIN: Only pairs that satisfy the predicate and have valid indices are kept.
  • LEFT_JOIN: All left rows are preserved. Failed predicates nullify right indices.
  • FULL_JOIN: All rows from both sides are preserved. Failed predicates create separate pairs.

Note on JoinNoMatch pairs: If an input pair already contains JoinNoMatch in either position, the predicate cannot be evaluated and the pair passes through unchanged. The "separate pairs" splitting only occurs when both indices are valid but the predicate fails. For example, a FULL_JOIN pair (5, 10) that fails the predicate becomes two pairs: (5, JoinNoMatch) and (JoinNoMatch, 10), ensuring both rows appear in the output.

Usage Pattern

Typical usage involves performing an equality-based hash join first, then filtering the results with a conditional predicate:

// Step 1: Perform equality-based hash join
auto hash_joiner = cudf::hash_join(right_equality_table, null_equality::EQUAL);
auto [left_indices, right_indices] = hash_joiner.inner_join(left_equality_table);
// Step 2: Apply conditional filter on conditional columns
auto [filtered_left, filtered_right] = cudf::filter_join_indices(
left_conditional_table, // Table with columns referenced by predicate
right_conditional_table, // Table with columns referenced by predicate
*left_indices, // Indices from hash join
*right_indices, // Indices from hash join
predicate, // AST expression: e.g., left.col0 > right.col0
Hash join that builds hash table in creation and probes results in subsequent *_join member functions...
Definition: hash_join.hpp:64
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > filter_join_indices(cudf::table_view const &left, cudf::table_view const &right, cudf::device_span< size_type const > left_indices, cudf::device_span< size_type const > right_indices, cudf::ast::expression const &predicate, cudf::join_kind join_kind, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
Filters join result indices based on a conditional predicate and join type.
@ INNER_JOIN
Inner join: only matching rows from both tables.
@ EQUAL
nulls compare equal

Example

Left equality: {id: [1, 2, 3]}
Right equality: {id: [1, 2, 3]}
Left conditional: {val: [10, 20, 30]}
Right conditional:{val: [15, 15, 25]}
Hash join (id == id): left_indices = {0, 1, 2}, right_indices = {0, 1, 2}
Predicate: left.val > right.val
INNER_JOIN result: left_indices = {1, 2}, right_indices = {1, 2} // 20>15, 30>25
LEFT_JOIN result: left_indices = {0, 1, 2}, right_indices = {JoinNoMatch, 1, 2}
Exceptions
std::invalid_argumentif join_kind is not INNER_JOIN, LEFT_JOIN, or FULL_JOIN.
std::invalid_argumentif left_indices and right_indices have different sizes.
Parameters
leftThe left table for predicate evaluation (conditional columns only).
rightThe right table for predicate evaluation (conditional columns only).
left_indicesDevice span of row indices in the left table from hash join.
right_indicesDevice span of row indices in the right table from hash join.
predicateAn AST expression that returns a boolean for each pair of rows.
join_kindThe type of join operation. Must be INNER_JOIN, LEFT_JOIN, or FULL_JOIN.
streamCUDA stream used for kernel launches and memory operations.
mrDevice memory resource used to allocate output indices.
Returns
A pair of device vectors [filtered_left_indices, filtered_right_indices] corresponding to rows that satisfy the join semantics and predicate.

◆ full_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::full_join ( cudf::table_view const &  left_keys,
cudf::table_view const &  right_keys,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to a full join between the specified tables.

Taken pairwise, the values from the returned vectors are one of: (1) row indices corresponding to matching rows from the left and right tables, (2) a row index and JoinNoMatch, representing a row from one table without a match in the other.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{0, 1, 2, None}, {None, 0, 1, 2}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
Exceptions
cudf::logic_errorif number of elements in left_keys or right_keys mismatch.
Parameters
[in]left_keysThe left table
[in]right_keysThe right table
[in]compare_nullscontrols whether null join-key values should match or not.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a full join between two tables with left_keys and right_keys as the join keys .

◆ inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::inner_join ( cudf::table_view const &  left_keys,
cudf::table_view const &  right_keys,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to an inner join between the specified tables.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{1, 2}, {0, 1}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{1}, {0}}
Exceptions
cudf::logic_errorif number of elements in left_keys or right_keys mismatch.
Parameters
[in]left_keysThe left table
[in]right_keysThe right table
[in]compare_nullscontrols whether null join-key values should match or not.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left_keys and right_keys as the join keys .

◆ left_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::left_join ( cudf::table_view const &  left_keys,
cudf::table_view const &  right_keys,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to a left join between the specified tables.

The first returned vector contains all the row indices from the left table (in unspecified order). The corresponding value in the second returned vector is either (1) the row index of the matched row from the right table, if there is a match or (2) JoinNoMatch.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{0, 1, 2}, {None, 0, 1}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{0, 1, 2}, {None, 0, None}}
Exceptions
cudf::logic_errorif number of elements in left_keys or right_keys mismatch.
Parameters
[in]left_keysThe left table
[in]right_keysThe right table
[in]compare_nullscontrols whether null join-key values should match or not.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a left join between two tables with left_keys and right_keys as the join keys .

◆ merge_inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::merge_inner_join ( cudf::table_view const &  left_keys,
cudf::table_view const &  right_keys,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to an inner join between the specified tables.

Assumes pre-sorted inputs and performs only the merge step. The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

Deprecated:
Use the object-oriented sort_merge_join API cudf::sort_merge_join::inner_join instead
Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{1, 2}, {0, 1}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{1}, {0}}
Exceptions
std::invalid_argumentif number of elements in left_keys or right_keys mismatch.
Parameters
[in]left_keysThe left table
[in]right_keysThe right table
[in]compare_nullscontrols whether null join-key values should match or not.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left_keys and right_keys as the join keys .

◆ mixed_full_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::mixed_full_join ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
output_size_data_type  output_size_data = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables, or null matches for rows in either pair of tables that have no matches in the other pair.

Taken pairwise, the values from the returned vectors are one of: (1) row indices corresponding to matching rows from the left and right tables, (2) a row index and an unspecified out-of-bounds value, representing a row from one table without a match in the other.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
output_size_dataAn optional pair of values indicating the exact output size and the number of matches for each row in the larger of the two input tables, left or right (may be precomputed using the corresponding mixed_full_join_size API).
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed full join between the four input tables.

◆ mixed_inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::mixed_inner_join ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
output_size_data_type  output_size_data = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {{1}, {0}}
Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
output_size_dataAn optional pair of values indicating the exact output size and the number of matches for each row in the larger of the two input tables, left or right (may be precomputed using the corresponding mixed_inner_join_size API).
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed inner join between the four input tables.

◆ mixed_inner_join_size()

std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::mixed_inner_join_size ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the exact number of matches (rows) when performing a mixed inner join between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair containing the size that would result from performing the requested join and the number of matches for each row in one of the two tables. Which of the two tables is an implementation detail and should not be relied upon, simply passed to the corresponding mixed_inner_join API as is.

◆ mixed_left_anti_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::mixed_left_anti_join ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns an index vector corresponding to all rows in the left tables for which there is no row in the right tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), the left row is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {0, 2}
Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector of indices from the left table that do not have matches in the right table.

◆ mixed_left_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::mixed_left_join ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
output_size_data_type  output_size_data = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables, or null matches for rows in left that have no match in right.

The first returned vector contains the row indices from the left tables that have a match in the right tables (in unspecified order). The corresponding value in the second returned vector is either (1) the row index of the matched row from the right tables, or (2) an unspecified out-of-bounds value.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {{0, 1, 2}, {None, 0, None}}
Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
output_size_dataAn optional pair of values indicating the exact output size and the number of matches for each row in the larger of the two input tables, left or right (may be precomputed using the corresponding mixed_left_join_size API).
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed left join between the four input tables.

◆ mixed_left_join_size()

std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::mixed_left_join_size ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns the exact number of matches (rows) when performing a mixed left join between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair containing the size that would result from performing the requested join and the number of matches for each row in one of the two tables. Which of the two tables is an implementation detail and should not be relied upon, simply passed to the corresponding mixed_left_join API as is.

◆ mixed_left_semi_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::mixed_left_semi_join ( table_view const &  left_equality,
table_view const &  right_equality,
table_view const &  left_conditional,
table_view const &  right_conditional,
ast::expression const &  binary_predicate,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns an index vector corresponding to all rows in the left tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), the left row is not included in the output. It is the user's responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {1}
Exceptions
cudf::data_type_errorIf the binary predicate outputs a non-boolean result.
cudf::logic_errorIf the number of rows in left_equality and left_conditional do not match.
cudf::logic_errorIf the number of rows in right_equality and right_conditional do not match.
Parameters
left_equalityThe left table used for the equality join
right_equalityThe right table used for the equality join
left_conditionalThe left table used for the conditional join
right_conditionalThe right table used for the conditional join
binary_predicateThe condition on which to join
compare_nullsWhether or not null values join to each other or not
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector of indices from the left table that have matches in the right table.

◆ sort_merge_inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::sort_merge_inner_join ( cudf::table_view const &  left_keys,
cudf::table_view const &  right_keys,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Returns a pair of row index vectors corresponding to an inner join between the specified tables.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

Deprecated:
Use the object-oriented sort_merge_join API cudf::sort_merge_join::inner_join instead
Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{1, 2}, {0, 1}}
Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{1}, {0}}
Exceptions
std::invalid_argumentif number of elements in left_keys or right_keys mismatch.
Parameters
[in]left_keysThe left table
[in]right_keysThe right table
[in]compare_nullscontrols whether null join-key values should match or not.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left_keys and right_keys as the join keys .

Variable Documentation

◆ JoinNoMatch

constexpr CUDF_HOST_DEVICE size_type cudf::JoinNoMatch = cuda::std::numeric_limits<size_type>::min()
constexpr

Sentinel value used to indicate an unmatched row index in join operations.

This value is used in join result indices to represent rows that do not have a match in the other table (e.g., in left joins, full joins, or when using filter_gather_map with null indices from outer joins).

The value is set to the minimum possible value for size_type to ensure it's easily distinguishable from valid row indices, which are always non-negative.

Definition at line 55 of file join.hpp.