Filtered hash join that builds hash table on creation and probes results in subsequent *_join
member functions.
More...
#include <filtered_join.hpp>
Public Member Functions | |
filtered_join (filtered_join const &)=delete | |
filtered_join (filtered_join &&)=delete | |
filtered_join & | operator= (filtered_join const &)=delete |
filtered_join & | operator= (filtered_join &&)=delete |
filtered_join (cudf::table_view const &build, cudf::null_equality compare_nulls=null_equality::EQUAL, set_as_build_table reuse_tbl=set_as_build_table::RIGHT, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
Constructs a filtered hash join object for subsequent probe calls. More... | |
filtered_join (cudf::table_view const &build, null_equality compare_nulls=null_equality::EQUAL, set_as_build_table reuse_tbl=set_as_build_table::RIGHT, double load_factor=0.5, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
Constructs a filtered hash join object for subsequent probe calls. More... | |
std::unique_ptr< rmm::device_uvector< size_type > > | semi_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
Returns a vector of row indices corresponding to a semi-join between the specified tables. More... | |
std::unique_ptr< rmm::device_uvector< size_type > > | anti_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
Returns a vector of row indices corresponding to a anti-join between the specified tables. More... | |
Filtered hash join that builds hash table on creation and probes results in subsequent *_join
member functions.
This class enables the filtered hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel). When the hash table is created from the right table i.e. the table that acts as the filter to be applied on left tables in subsequent _join
operations, the cuco::static_set
data structure is used. On the other hand, when the left table is to be reused, the underlying hash table data structure is the cuco::static_multiset
. Since multiset operations are computationally more expensive that set operations, right table reuse should be preferred if possible.
Definition at line 65 of file filtered_join.hpp.
cudf::filtered_join::filtered_join | ( | cudf::table_view const & | build, |
cudf::null_equality | compare_nulls = null_equality::EQUAL , |
||
set_as_build_table | reuse_tbl = set_as_build_table::RIGHT , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Constructs a filtered hash join object for subsequent probe calls.
build | The build table |
compare_nulls | Controls whether null join-key values should match or not |
reuse_tbl | Specifies which table to use as the build table. If LEFT, the build table is considered as the left table and is reused with multiple right (probe) tables. If RIGHT, the build table is considered as the right/filter table and will be applied to multiple left (probe) tables. |
stream | CUDA stream used for device memory operations and kernel launches |
cudf::filtered_join::filtered_join | ( | cudf::table_view const & | build, |
null_equality | compare_nulls = null_equality::EQUAL , |
||
set_as_build_table | reuse_tbl = set_as_build_table::RIGHT , |
||
double | load_factor = 0.5 , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Constructs a filtered hash join object for subsequent probe calls.
build | The build table |
compare_nulls | Controls whether null join-key values should match or not |
reuse_tbl | Specifies which table to use as the build table. If LEFT, the build table is considered as the left table and is reused with multiple right (probe) tables. If RIGHT, the build table is considered as the right/filter table and will be applied to multiple left (probe) tables. |
load_factor | The desired ratio of filled slots to total slots in the hash table, must be in range (0,1]. For example, 0.5 indicates a target of 50% occupancy. Note that the actual occupancy achieved may be slightly lower than the specified value. |
stream | CUDA stream used for device memory operations and kernel launches |
std::unique_ptr<rmm::device_uvector<size_type> > cudf::filtered_join::anti_join | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns a vector of row indices corresponding to a anti-join between the specified tables.
The returned vector contains the row indices from the left table for which there are no matching rows in the right table. Note that the left table is the build table if reuse_left_table
is set to true, and is the probe table otherwise.
probe | The probe table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory |
left_indices
that can be used to construct the result of performing a left anti join std::unique_ptr<rmm::device_uvector<size_type> > cudf::filtered_join::semi_join | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns a vector of row indices corresponding to a semi-join between the specified tables.
The returned vector contains the row indices from the left table for which there is a matching row in the right table. Note that the left table is the build table if reuse_left_table
is set to true, and is the probe table otherwise.
probe | The probe table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory |
left_indices
that can be used to construct the result of performing a left semi join