Public Member Functions | List of all members
cudf::filtered_join Class Reference

Filtered hash join that builds hash table on creation and probes results in subsequent *_join member functions. More...

#include <filtered_join.hpp>

Public Member Functions

 filtered_join (filtered_join const &)=delete
 
 filtered_join (filtered_join &&)=delete
 
filtered_joinoperator= (filtered_join const &)=delete
 
filtered_joinoperator= (filtered_join &&)=delete
 
 filtered_join (cudf::table_view const &build, cudf::null_equality compare_nulls=null_equality::EQUAL, set_as_build_table reuse_tbl=set_as_build_table::RIGHT, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructs a filtered hash join object for subsequent probe calls. More...
 
 filtered_join (cudf::table_view const &build, null_equality compare_nulls=null_equality::EQUAL, set_as_build_table reuse_tbl=set_as_build_table::RIGHT, double load_factor=0.5, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructs a filtered hash join object for subsequent probe calls. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > semi_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Returns a vector of row indices corresponding to a semi-join between the specified tables. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > anti_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Returns a vector of row indices corresponding to a anti-join between the specified tables. More...
 

Detailed Description

Filtered hash join that builds hash table on creation and probes results in subsequent *_join member functions.

This class enables the filtered hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel). When the hash table is created from the right table i.e. the table that acts as the filter to be applied on left tables in subsequent _join operations, the cuco::static_set data structure is used. On the other hand, when the left table is to be reused, the underlying hash table data structure is the cuco::static_multiset. Since multiset operations are computationally more expensive that set operations, right table reuse should be preferred if possible.

Note
All NaNs are considered as equal

Definition at line 65 of file filtered_join.hpp.

Constructor & Destructor Documentation

◆ filtered_join() [1/2]

cudf::filtered_join::filtered_join ( cudf::table_view const &  build,
cudf::null_equality  compare_nulls = null_equality::EQUAL,
set_as_build_table  reuse_tbl = set_as_build_table::RIGHT,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructs a filtered hash join object for subsequent probe calls.

Parameters
buildThe build table
compare_nullsControls whether null join-key values should match or not
reuse_tblSpecifies which table to use as the build table. If LEFT, the build table is considered as the left table and is reused with multiple right (probe) tables. If RIGHT, the build table is considered as the right/filter table and will be applied to multiple left (probe) tables.
streamCUDA stream used for device memory operations and kernel launches

◆ filtered_join() [2/2]

cudf::filtered_join::filtered_join ( cudf::table_view const &  build,
null_equality  compare_nulls = null_equality::EQUAL,
set_as_build_table  reuse_tbl = set_as_build_table::RIGHT,
double  load_factor = 0.5,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructs a filtered hash join object for subsequent probe calls.

Parameters
buildThe build table
compare_nullsControls whether null join-key values should match or not
reuse_tblSpecifies which table to use as the build table. If LEFT, the build table is considered as the left table and is reused with multiple right (probe) tables. If RIGHT, the build table is considered as the right/filter table and will be applied to multiple left (probe) tables.
load_factorThe desired ratio of filled slots to total slots in the hash table, must be in range (0,1]. For example, 0.5 indicates a target of 50% occupancy. Note that the actual occupancy achieved may be slightly lower than the specified value.
streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ anti_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::filtered_join::anti_join ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Returns a vector of row indices corresponding to a anti-join between the specified tables.

The returned vector contains the row indices from the left table for which there are no matching rows in the right table. Note that the left table is the build table if reuse_left_table is set to true, and is the probe table otherwise.

TableA: {{0, 1, 2}}
TableB: {{1, 2, 3}}
Result: {1, 2}
Parameters
probeThe probe table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector left_indices that can be used to construct the result of performing a left anti join

◆ semi_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::filtered_join::semi_join ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Returns a vector of row indices corresponding to a semi-join between the specified tables.

The returned vector contains the row indices from the left table for which there is a matching row in the right table. Note that the left table is the build table if reuse_left_table is set to true, and is the probe table otherwise.

TableA: {{0, 1, 2}}
TableB: {{1, 2, 3}}
Result: {1, 2}
Parameters
probeThe probe table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector left_indices that can be used to construct the result of performing a left semi join

The documentation for this class was generated from the following file: