Mark-based hash join for semi/anti join with left table reuse. More...
#include <mark_join.hpp>
Public Member Functions | |
| mark_join (mark_join const &)=delete | |
| mark_join (mark_join &&)=delete | |
| mark_join & | operator= (mark_join const &)=delete |
| mark_join & | operator= (mark_join &&)=delete |
| mark_join (cudf::table_view const &left, cudf::null_equality compare_nulls, cudf::join_prefilter prefilter, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
| Constructs a mark join object with explicit prefilter selection. More... | |
| mark_join (cudf::table_view const &left, double load_factor, cudf::null_equality compare_nulls=cudf::null_equality::EQUAL, cudf::join_prefilter prefilter=cudf::join_prefilter::NO, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
| Constructs a mark join object with explicit prefilter selection. More... | |
| std::unique_ptr< rmm::device_uvector< size_type > > | semi_join (cudf::table_view const &right, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
| Returns left row indices that have at least one match in the right table. More... | |
| std::unique_ptr< rmm::device_uvector< size_type > > | anti_join (cudf::table_view const &right, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
| Returns left row indices that have no match in the right table. More... | |
Mark-based hash join for semi/anti join with left table reuse.
Builds a hash table from the left (build) table using a multiset that allows duplicate keys. The probe kernel atomically marks matching left entries via CAS on the hash MSB, then a retrieve kernel collects marked (semi) or unmarked (anti) entries.
This class enables building the hash table once and probing multiple times with different right (probe) tables, amortizing the build cost. Probe-side prefiltering can be enabled at construction time via join_prefilter.
For the common case where the right (filter) table is reused, use cudf::filtered_join instead, which builds a distinct set from the right table.
Definition at line 61 of file mark_join.hpp.
| cudf::mark_join::mark_join | ( | cudf::table_view const & | left, |
| cudf::null_equality | compare_nulls, | ||
| cudf::join_prefilter | prefilter, | ||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Constructs a mark join object with explicit prefilter selection.
| left | The left table; the hash table is built from this table |
| compare_nulls | Controls whether null join-key values should match or not |
| prefilter | Controls whether an optional probe-side prefilter is enabled |
| stream | CUDA stream used for device memory operations and kernel launches |
| cudf::mark_join::mark_join | ( | cudf::table_view const & | left, |
| double | load_factor, | ||
| cudf::null_equality | compare_nulls = cudf::null_equality::EQUAL, |
||
| cudf::join_prefilter | prefilter = cudf::join_prefilter::NO, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Constructs a mark join object with explicit prefilter selection.
| left | The left table; the hash table is built from this table |
| load_factor | Hash table load factor in range (0,1] |
| compare_nulls | Controls whether null join-key values should match or not |
| prefilter | Controls whether an optional probe-side prefilter is enabled |
| stream | CUDA stream used for device memory operations and kernel launches |
| std::unique_ptr<rmm::device_uvector<size_type> > cudf::mark_join::anti_join | ( | cudf::table_view const & | right, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns left row indices that have no match in the right table.
| right | The right table; probed against the hash table built from the left table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned device memory |
| std::unique_ptr<rmm::device_uvector<size_type> > cudf::mark_join::semi_join | ( | cudf::table_view const & | right, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns left row indices that have at least one match in the right table.
| right | The right table; probed against the hash table built from the left table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned device memory |