Column Join#

group Joining

Enums

enum class nullable_join : bool#

The enum class to specify if any of the input join tables (build table and any later probe table) has nulls.

This is used upon hash_join object construction to specify the existence of nulls in all the possible input tables. If such null existence is unknown, YES should be used as the default option.

Values:

enumerator YES#
enumerator NO#

Functions

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> conditional_inner_join(table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {{1, 2}, {0, 1}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {{1}, {0}}
Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a conditional inner join between two tables left and right .

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> conditional_left_join(table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true, or null matches for rows in left that have no match in right.

The first returned vector contains all the row indices from the left table (in unspecified order). The corresponding value in the second returned vector is either (1) the row index of the matched row from the right table, if there is a match or (2) an unspecified out-of-bounds value.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {{0, 1, 2}, {None, 0, 1}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {{0, 1, 2}, {None, 0, None}}
Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a conditional left join between two tables left and right .

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> conditional_full_join(table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the predicate evaluates to true, or null matches for rows in either table that have no match in the other.

Taken pairwise, the values from the returned vectors are one of: (1) row indices corresponding to matching rows from the left and right tables, (2) a row index and an unspecified out-of-bounds value, representing a row from one table without a match in the other.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {{0, 1, 2, None}, {None, 0, 1, 2}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a conditional full join between two tables left and right .

std::unique_ptr<rmm::device_uvector<size_type>> conditional_left_semi_join(table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns an index vector corresponding to all rows in the left table for which there exists some row in the right table where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {1, 2}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {1}
Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A vector left_indices that can be used to construct the result of performing a conditional left semi join between two tables left and right .

std::unique_ptr<rmm::device_uvector<size_type>> conditional_left_anti_join(table_view const &left, table_view const &right, ast::expression const &binary_predicate, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns an index vector corresponding to all rows in the left table for which there does not exist any row in the right table where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Expression: Left.Column_0 == Right.Column_0
Result: {0}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Expression: (Left.Column_0 == Right.Column_0) AND (Left.Column_1 == Right.Column_1)
Result: {0, 2}
Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A vector left_indices that can be used to construct the result of performing a conditional left anti join between two tables left and right .

std::size_t conditional_inner_join_size(table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the exact number of matches (rows) when performing a conditional inner join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

The size that would result from performing the requested join

std::size_t conditional_left_join_size(table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the exact number of matches (rows) when performing a conditional left join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

The size that would result from performing the requested join

std::size_t conditional_left_semi_join_size(table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the exact number of matches (rows) when performing a conditional left semi join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

The size that would result from performing the requested join

std::size_t conditional_left_anti_join_size(table_view const &left, table_view const &right, ast::expression const &binary_predicate, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the exact number of matches (rows) when performing a conditional left anti join between the specified tables where the predicate evaluates to true.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output.

Throws:

cudf::data_type_error – if the binary predicate outputs a non-boolean result.

Parameters:
  • left – The left table

  • right – The right table

  • binary_predicate – The condition on which to join

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

The size that would result from performing the requested join

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> inner_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to an inner join between the specified tables.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{1, 2}, {0, 1}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{1}, {0}}
Throws:

cudf::logic_error – if number of elements in left_keys or right_keys mismatch.

Parameters:
  • left_keys[in] The left table

  • right_keys[in] The right table

  • compare_nulls[in] controls whether null join-key values should match or not.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left_keys and right_keys as the join keys .

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> left_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to a left join between the specified tables.

The first returned vector contains all the row indices from the left table (in unspecified order). The corresponding value in the second returned vector is either (1) the row index of the matched row from the right table, if there is a match or (2) an unspecified out-of-bounds value.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{0, 1, 2}, {None, 0, 1}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{0, 1, 2}, {None, 0, None}}
Throws:

cudf::logic_error – if number of elements in left_keys or right_keys mismatch.

Parameters:
  • left_keys[in] The left table

  • right_keys[in] The right table

  • compare_nulls[in] controls whether null join-key values should match or not.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a left join between two tables with left_keys and right_keys as the join keys .

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> full_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to a full join between the specified tables.

Taken pairwise, the values from the returned vectors are one of: (1) row indices corresponding to matching rows from the left and right tables, (2) a row index and an unspecified out-of-bounds value, representing a row from one table without a match in the other.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{0, 1, 2, None}, {None, 0, 1, 2}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
Throws:

cudf::logic_error – if number of elements in left_keys or right_keys mismatch.

Parameters:
  • left_keys[in] The left table

  • right_keys[in] The right table

  • compare_nulls[in] controls whether null join-key values should match or not.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a full join between two tables with left_keys and right_keys as the join keys .

std::unique_ptr<rmm::device_uvector<size_type>> left_semi_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a vector of row indices corresponding to a left semi-join between the specified tables.

The returned vector contains the row indices from the left table for which there is a matching row in the right table.

TableA: {{0, 1, 2}}
TableB: {{1, 2, 3}}
Result: {1, 2}
Parameters:
  • left_keys – The left table

  • right_keys – The right table

  • compare_nulls – Controls whether null join-key values should match or not

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A vector left_indices that can be used to construct the result of performing a left semi join between two tables with left_keys and right_keys as the join keys .

std::unique_ptr<rmm::device_uvector<size_type>> left_anti_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a vector of row indices corresponding to a left anti join between the specified tables.

The returned vector contains the row indices from the left table for which there is no matching row in the right table.

TableA: {{0, 1, 2}}
TableB: {{1, 2, 3}}
Result: {0}
Throws:

cudf::logic_error – if the number of columns in either left_keys or right_keys is 0

Parameters:
  • left_keys[in] The left table

  • right_keys[in] The right table

  • compare_nulls[in] controls whether null join-key values should match or not.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A column left_indices that can be used to construct the result of performing a left anti join between two tables with left_keys and right_keys as the join keys .

std::unique_ptr<cudf::table> cross_join(cudf::table_view const &left, cudf::table_view const &right, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Performs a cross join on two tables (left, right)

The cross join returns the cartesian product of rows from each table.

Left a: {0, 1, 2}
Right b: {3, 4, 5}
Result: { a: {0, 0, 0, 1, 1, 1, 2, 2, 2}, b: {3, 4, 5, 3, 4, 5, 3, 4, 5} }

Note

Warning: This function can easily cause out-of-memory errors. The size of the output is equal to left.num_rows() * right.num_rows(). Use with caution.

Throws:

cudf::logic_error – if the number of columns in either left or right table is 0

Parameters:
  • left – The left table

  • right – The right table

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table’s device memory

Returns:

Result of cross joining left and right tables

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_inner_join(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, std::optional<std::pair<std::size_t, device_span<size_type const>>> output_size_data = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {{1}, {0}}
Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • output_size_data – An optional pair of values indicating the exact output size and the number of matches for each row in the larger of the two input tables, left or right (may be precomputed using the corresponding mixed_inner_join_size API).

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed inner join between the four input tables.

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_left_join(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, std::optional<std::pair<std::size_t, device_span<size_type const>>> output_size_data = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables, or null matches for rows in left that have no match in right.

The first returned vector contains the row indices from the left tables that have a match in the right tables (in unspecified order). The corresponding value in the second returned vector is either (1) the row index of the matched row from the right tables, or (2) an unspecified out-of-bounds value.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {{0, 1, 2}, {None, 0, None}}
Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • output_size_data – An optional pair of values indicating the exact output size and the number of matches for each row in the larger of the two input tables, left or right (may be precomputed using the corresponding mixed_left_join_size API).

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed left join between the four input tables.

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_full_join(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, std::optional<std::pair<std::size_t, device_span<size_type const>>> output_size_data = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to all pairs of rows between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables, or null matches for rows in either pair of tables that have no matches in the other pair.

Taken pairwise, the values from the returned vectors are one of: (1) row indices corresponding to matching rows from the left and right tables, (2) a row index and an unspecified out-of-bounds value, representing a row from one table without a match in the other.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • output_size_data – An optional pair of values indicating the exact output size and the number of matches for each row in the larger of the two input tables, left or right (may be precomputed using the corresponding mixed_full_join_size API).

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed full join between the four input tables.

std::unique_ptr<rmm::device_uvector<size_type>> mixed_left_semi_join(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns an index vector corresponding to all rows in the left tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), the left row is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {1}
Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed full join between the four input tables.

std::unique_ptr<rmm::device_uvector<size_type>> mixed_left_anti_join(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns an index vector corresponding to all rows in the left tables for which there is no row in the right tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), the left row is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

If the provided output size or per-row counts are incorrect, behavior is undefined.

left_equality: {{0, 1, 2}}
right_equality: {{1, 2, 3}}
left_conditional: {{4, 4, 4}}
right_conditional: {{3, 4, 5}}
Expression: Left.Column_0 > Right.Column_0
Result: {0, 2}
Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing a mixed full join between the four input tables.

std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_inner_join_size(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the exact number of matches (rows) when performing a mixed inner join between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair containing the size that would result from performing the requested join and the number of matches for each row in one of the two tables. Which of the two tables is an implementation detail and should not be relied upon, simply passed to the corresponding mixed_inner_join API as is.

std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_left_join_size(table_view const &left_equality, table_view const &right_equality, table_view const &left_conditional, table_view const &right_conditional, ast::expression const &binary_predicate, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the exact number of matches (rows) when performing a mixed left join between the specified tables where the columns of the equality table are equal and the predicate evaluates to true on the conditional tables.

If the provided predicate returns NULL for a pair of rows (left, right), that pair is not included in the output. It is the user’s responsibility to choose a suitable compare_nulls value AND use appropriate null-safe operators in the expression.

Throws:
  • cudf::data_type_error – If the binary predicate outputs a non-boolean result.

  • cudf::logic_error – If the number of rows in left_equality and left_conditional do not match.

  • cudf::logic_error – If the number of rows in right_equality and right_conditional do not match.

Parameters:
  • left_equality – The left table used for the equality join

  • right_equality – The right table used for the equality join

  • left_conditional – The left table used for the conditional join

  • right_conditional – The right table used for the conditional join

  • binary_predicate – The condition on which to join

  • compare_nulls – Whether or not null values join to each other or not

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair containing the size that would result from performing the requested join and the number of matches for each row in one of the two tables. Which of the two tables is an implementation detail and should not be relied upon, simply passed to the corresponding mixed_left_join API as is.

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> sort_merge_inner_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to an inner join between the specified tables.

The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{1, 2}, {0, 1}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{1}, {0}}
Throws:

cudf::logic_error – if number of elements in left_keys or right_keys mismatch.

Parameters:
  • left_keys[in] The left table

  • right_keys[in] The right table

  • compare_nulls[in] controls whether null join-key values should match or not.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left_keys and right_keys as the join keys .

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> merge_inner_join(cudf::table_view const &left_keys, cudf::table_view const &right_keys, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns a pair of row index vectors corresponding to an inner join between the specified tables.

Assumes pre-sorted inputs and performs only the merge step. The first returned vector contains the row indices from the left table that have a match in the right table (in unspecified order). The corresponding values in the second returned vector are the matched row indices from the right table.

Left: {{0, 1, 2}}
Right: {{1, 2, 3}}
Result: {{1, 2}, {0, 1}}

Left: {{0, 1, 2}, {3, 4, 5}}
Right: {{1, 2, 3}, {4, 6, 7}}
Result: {{1}, {0}}
Throws:

cudf::logic_error – if number of elements in left_keys or right_keys mismatch.

Parameters:
  • left_keys[in] The left table

  • right_keys[in] The right table

  • compare_nulls[in] controls whether null join-key values should match or not.

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory

Returns:

A pair of vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left_keys and right_keys as the join keys .

class distinct_hash_join#
#include <distinct_hash_join.hpp>

Distinct hash join that builds hash table in creation and probes results in subsequent *_join member functions.

This class enables the distinct hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).

Note

Behavior is undefined if the build table contains duplicates.

Note

All NaNs are considered as equal

Public Functions

distinct_hash_join(cudf::table_view const &build, null_equality compare_nulls = null_equality::EQUAL, double load_factor = 0.5, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Constructs a distinct hash join object for subsequent probe calls.

Throws:
  • cudf::logic_error – if the build table has no columns

  • std::invalid_argument – if load_factor is not greater than 0 and less than or equal to 1

Parameters:
  • build – The build table that contains distinct elements

  • compare_nulls – Controls whether null join-key values should match or not

  • load_factor – The desired ratio of filled slots to total slots in the hash table, must be in range (0,1]. For example, 0.5 indicates a target of 50% occupancy. Note that the actual occupancy achieved may be slightly lower than the specified value.

  • stream – CUDA stream used for device memory operations and kernel launches

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> inner_join(cudf::table_view const &probe, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()) const#

Returns the row indices that can be used to construct the result of performing an inner join between two tables.

See also

cudf::inner_join().

Parameters:
  • probe – The probe table, from which the keys are probed

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned indices’ device memory.

Returns:

A pair of columns [probe_indices, build_indices] that can be used to construct the result of performing an inner join between two tables with build and probe as the join keys.

std::unique_ptr<rmm::device_uvector<size_type>> left_join(cudf::table_view const &probe, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()) const#

Returns the build table indices that can be used to construct the result of performing a left join between two tables.

Note

For a given row index i of the probe table, the resulting build_indices[i] contains the row index of the matched row from the build table if there is a match. Otherwise, contains JoinNoneValue.

Parameters:
  • probe – The probe table, from which the keys are probed

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory.

Returns:

A build_indices column that can be used to construct the result of performing a left join between two tables with build and probe as the join keys.

class hash_join#
#include <hash_join.hpp>

Hash join that builds hash table in creation and probes results in subsequent *_join member functions.

This class enables the hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).

Public Types

using impl_type = typename cudf::detail::hash_join<cudf::hashing::detail::MurmurHash3_x86_32<cudf::hash_value_type>>#

Implementation type.

Public Functions

hash_join(cudf::table_view const &build, null_equality compare_nulls, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Construct a hash join object for subsequent probe calls.

Note

The hash_join object must not outlive the table viewed by build, else behavior is undefined.

Throws:

cudf::logic_error – if the build table has no columns

Parameters:
  • build – The build table, from which the hash table is built

  • compare_nulls – Controls whether null join-key values should match or not

  • stream – CUDA stream used for device memory operations and kernel launches

hash_join(cudf::table_view const &build, nullable_join has_nulls, null_equality compare_nulls, double load_factor, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Construct a hash join object for subsequent probe calls.

Note

The hash_join object must not outlive the table viewed by build, else behavior is undefined.

Throws:
  • cudf::logic_error – if the build table has no columns

  • std::invalid_argument – if load_factor is not greater than 0 and less than or equal to 1

Parameters:
  • build – The build table, from which the hash table is built

  • compare_nulls – Controls whether null join-key values should match or not

  • stream – CUDA stream used for device memory operations and kernel launches

  • has_nulls – Flag to indicate if there exists any nulls in the build table or any probe table that will be used later for join

  • load_factor – The hash table occupancy ratio in (0,1]. A value of 0.5 means 50% desired occupancy.

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> inner_join(cudf::table_view const &probe, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()) const#

Returns the row indices that can be used to construct the result of performing an inner join between two tables.

See also

cudf::inner_join(). Behavior is undefined if the provided output_size is smaller than the actual output size.

Parameters:
  • probe – The probe table, from which the tuples are probed

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory.

Throws:

cudf::logic_error – If the input probe table has nulls while this hash_join object was not constructed with null check.

Returns:

A pair of columns [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with build and probe as the join keys .

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> left_join(cudf::table_view const &probe, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()) const#

Returns the row indices that can be used to construct the result of performing a left join between two tables.

See also

cudf::left_join(). Behavior is undefined if the provided output_size is smaller than the actual output size.

Parameters:
  • probe – The probe table, from which the tuples are probed

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory.

Throws:

cudf::logic_error – If the input probe table has nulls while this hash_join object was not constructed with null check.

Returns:

A pair of columns [left_indices, right_indices] that can be used to construct the result of performing a left join between two tables with build and probe as the join keys.

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> full_join(cudf::table_view const &probe, std::optional<std::size_t> output_size = {}, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()) const#

Returns the row indices that can be used to construct the result of performing a full join between two tables.

See also

cudf::full_join(). Behavior is undefined if the provided output_size is smaller than the actual output size.

Parameters:
  • probe – The probe table, from which the tuples are probed

  • output_size – Optional value which allows users to specify the exact output size

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the returned table and columns’ device memory.

Throws:

cudf::logic_error – If the input probe table has nulls while this hash_join object was not constructed with null check.

Returns:

A pair of columns [left_indices, right_indices] that can be used to construct the result of performing a full join between two tables with build and probe as the join keys .

std::size_t inner_join_size(cudf::table_view const &probe, rmm::cuda_stream_view stream = cudf::get_default_stream()) const#

Returns the exact number of matches (rows) when performing an inner join with the specified probe table.

Parameters:
  • probe – The probe table, from which the tuples are probed

  • stream – CUDA stream used for device memory operations and kernel launches

Throws:

cudf::logic_error – If the input probe table has nulls while this hash_join object was not constructed with null check.

Returns:

The exact number of output when performing an inner join between two tables with build and probe as the join keys .

std::size_t left_join_size(cudf::table_view const &probe, rmm::cuda_stream_view stream = cudf::get_default_stream()) const#

Returns the exact number of matches (rows) when performing a left join with the specified probe table.

Parameters:
  • probe – The probe table, from which the tuples are probed

  • stream – CUDA stream used for device memory operations and kernel launches

Throws:

cudf::logic_error – If the input probe table has nulls while this hash_join object was not constructed with null check.

Returns:

The exact number of output when performing a left join between two tables with build and probe as the join keys .

std::size_t full_join_size(cudf::table_view const &probe, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()) const#

Returns the exact number of matches (rows) when performing a full join with the specified probe table.

Parameters:
  • probe – The probe table, from which the tuples are probed

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the intermediate table and columns’ device memory.

Throws:

cudf::logic_error – If the input probe table has nulls while this hash_join object was not constructed with null check.

Returns:

The exact number of output when performing a full join between two tables with build and probe as the join keys .

class sort_merge_join#
#include <sort_merge_join.hpp>

Class that implements sort-merge algorithm for table joins.

Public Functions

sort_merge_join(table_view const &right, sorted is_right_sorted, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = cudf::get_default_stream())#

Construct a sort-merge join object that pre-processes the right table on creation, and can be used on subsequent join operations with multiple left tables.

Note

The sort_merge_join object must not outlive the table viewed by right, else behavior is undefined.

Parameters:
  • right – The right table

  • is_right_sorted – Enum to indicate if right table is pre-sorted

  • compare_nulls – Controls whether null join-key values should match or not

  • stream – CUDA stream used for device memory operations and kernel launches

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> inner_join(table_view const &left, sorted is_left_sorted, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns the row indices that can be used to construct the result of performing an inner join between the right table passed while creating the sort_merge_join object, and the left table.

See also

cudf::inner_join().

Parameters:
  • left – The left table

  • is_left_sorted – Enum to indicate if left table is pre-sorted

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the join indices’ device memory.

Returns:

A pair of device vectors [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables

match_context inner_join_match_context(table_view const &left, sorted is_left_sorted, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Returns context information about matches between the left and right tables.

This method computes, for each row in the left table, how many matching rows exist in the right table according to inner join semantics, and returns the number of matches through a match_context object.

This is particularly useful for:

  • Determining the total size of a potential join result without materializing it

  • Planning partitioned join operations for large datasets

The returned match_context can be used directly with partitioned_inner_join() to process large joins in manageable chunks.

Parameters:
  • left – The left table to join with the pre-processed right table

  • is_left_sorted – Enum to indicate if left table is pre-sorted

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the result device memory

Returns:

A match_context object containing the left table view and a device vector of match counts for each row in the left table

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>, std::unique_ptr<rmm::device_uvector<size_type>>> partitioned_inner_join(partition_context const &context, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

Performs an inner join between a partition of the left table and the right table.

This method executes an inner join operation between a specific partition of the left table (defined by the partition_context) and the right table that was provided when constructing the sort_merge_join object. The partition_context must have been previously created by calling inner_join_match_context().

This partitioning approach enables processing large joins in smaller, memory-efficient chunks, while maintaining consistent results as if the entire join was performed at once. This is particularly useful for handling large datasets that would otherwise exceed available memory resources.

The returned indices can be used to construct the join result for this partition. The left_indices are relative to the original complete left table (not just the partition), so they can be used directly with the original left table to extract matching rows.

// Create join object with pre-processed right table
sort_merge_join join_obj(right_table, sorted::NO);

// Get match context for the entire left table
auto context = join_obj.inner_join_match_context(left_table, sorted::NO);

// Define partition boundaries (e.g., process 1000 rows at a time)
for (size_type start = 0; start < left_table.num_rows(); start += 1000) {
  size_type end = std::min(start + 1000, left_table.num_rows());

  // Create partition context
  sort_merge_join::partition_context part_ctx{context, start, end};

  // Get join indices for this partition
  auto [left_indices, right_indices] = join_obj.partitioned_inner_join(part_ctx);

  // Process the partition result...
}

Parameters:
  • context – The partition context containing match information and partition bounds

  • stream – CUDA stream used for device memory operations and kernel launches

  • mr – Device memory resource used to allocate the join indices’ device memory

Returns:

A pair of device vectors [left_indices, right_indices] containing the row indices from both tables that satisfy the join condition for this partition. The left_indices are relative to the complete left table, not just the partition.

struct match_context#
#include <sort_merge_join.hpp>

Holds context information about matches between tables during a join operation.

This structure stores the left table view and a device vector containing the count of matching rows in the right table for each row in the left table. Used primarily by inner_join_match_context() to track join match information.

Public Members

table_view _left_table#

View of the left table involved in the join operation.

std::unique_ptr<rmm::device_uvector<size_type>> _match_counts#

A device vector containing the count of matching rows in the right table for each row in left table

struct partition_context#
#include <sort_merge_join.hpp>

Stores context information for partitioned join operations.

This structure maintains context for partitioned join operations, containing the match context from a previous join operation along with the start and end indices that define the current partition of the left table being processed.

Used with partitioned_inner_join() to perform large joins in smaller chunks while preserving the context from the initial match operation.

Public Members

match_context left_table_context#

The match context from a previous inner_join_match_context call.

size_type left_start_idx#

The starting row index of the current left table partition.

size_type left_end_idx#

The ending row index (exclusive) of the current left table partition.