Class that implements sort-merge algorithm for table joins. More...
#include <sort_merge_join.hpp>
Public Member Functions | |
sort_merge_join (table_view const &right, sorted is_right_sorted, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
Construct a sort-merge join object that pre-processes the right table on creation, and can be used on subsequent join operations with multiple left tables. More... | |
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > | inner_join (table_view const &left, sorted is_left_sorted, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Returns the row indices that can be used to construct the result of performing an inner join between the right table passed while creating the sort_merge_join object, and the left table. More... | |
cudf::join_match_context | inner_join_match_context (table_view const &left, sorted is_left_sorted, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Returns context information about matches between the left and right tables. More... | |
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > | partitioned_inner_join (cudf::join_partition_context const &context, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) |
Performs an inner join between a partition of the left table and the right table. More... | |
Class that implements sort-merge algorithm for table joins.
Definition at line 45 of file sort_merge_join.hpp.
cudf::sort_merge_join::sort_merge_join | ( | table_view const & | right, |
sorted | is_right_sorted, | ||
null_equality | compare_nulls = null_equality::EQUAL , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Construct a sort-merge join object that pre-processes the right table on creation, and can be used on subsequent join operations with multiple left tables.
sort_merge_join
object must not outlive the table viewed by right
, else behavior is undefined.right | The right table |
is_right_sorted | Enum to indicate if right table is pre-sorted |
compare_nulls | Controls whether null join-key values should match or not |
stream | CUDA stream used for device memory operations and kernel launches |
std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::sort_merge_join::inner_join | ( | table_view const & | left, |
sorted | is_left_sorted, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns the row indices that can be used to construct the result of performing an inner join between the right table passed while creating the sort_merge_join object, and the left table.
left | The left table |
is_left_sorted | Enum to indicate if left table is pre-sorted |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the join indices' device memory. |
left_indices
, right_indices
] that can be used to construct the result of performing an inner join between two tables cudf::join_match_context cudf::sort_merge_join::inner_join_match_context | ( | table_view const & | left, |
sorted | is_left_sorted, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Returns context information about matches between the left and right tables.
This method computes, for each row in the left table, how many matching rows exist in the right table according to inner join semantics, and returns the number of matches through a match_context object.
This is particularly useful for:
The returned join_match_context can be used directly with partitioned_inner_join() to process large joins in manageable chunks.
left | The left table to join with the pre-processed right table |
is_left_sorted | Enum to indicate if left table is pre-sorted |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the result device memory |
std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::sort_merge_join::partitioned_inner_join | ( | cudf::join_partition_context const & | context, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Performs an inner join between a partition of the left table and the right table.
This method executes an inner join operation between a specific partition of the left table (defined by the join_partition_context) and the right table that was provided when constructing the sort_merge_join object. The join_partition_context must have been previously created by calling inner_join_match_context().
This partitioning approach enables processing large joins in smaller, memory-efficient chunks, while maintaining consistent results as if the entire join was performed at once. This is particularly useful for handling large datasets that would otherwise exceed available memory resources.
The returned indices can be used to construct the join result for this partition. The left_indices are relative to the original complete left table (not just the partition), so they can be used directly with the original left table to extract matching rows.
context | The partition context containing match information and partition bounds |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the join indices' device memory |
left_indices
, right_indices
] containing the row indices from both tables that satisfy the join condition for this partition. The left_indices are relative to the complete left table, not just the partition.