Public Member Functions | List of all members
cudf::filtered_join Class Reference

Filtered hash join that builds a hash table from the right (filter) table on creation and probes results in subsequent *_join member functions. More...

#include <filtered_join.hpp>

Public Member Functions

 filtered_join (filtered_join const &)=delete
 
 filtered_join (filtered_join &&)=delete
 
filtered_joinoperator= (filtered_join const &)=delete
 
filtered_joinoperator= (filtered_join &&)=delete
 
 filtered_join (cudf::table_view const &build, cudf::null_equality compare_nulls, rmm::cuda_stream_view stream)
 Constructs a filtered hash join object for subsequent probe calls. More...
 
 filtered_join (cudf::table_view const &build, cudf::null_equality compare_nulls, double load_factor, rmm::cuda_stream_view stream)
 Constructs a filtered hash join object for subsequent probe calls. More...
 
 filtered_join (cudf::table_view const &build, cudf::null_equality compare_nulls, set_as_build_table reuse_tbl, rmm::cuda_stream_view stream)
 Constructs a filtered hash join object for subsequent probe calls. More...
 
 filtered_join (cudf::table_view const &build, null_equality compare_nulls, set_as_build_table reuse_tbl, double load_factor, rmm::cuda_stream_view stream)
 Constructs a filtered hash join object for subsequent probe calls. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > semi_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Returns a vector of row indices corresponding to a semi-join between the specified tables. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > anti_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Returns a vector of row indices corresponding to an anti-join between the specified tables. More...
 

Detailed Description

Filtered hash join that builds a hash table from the right (filter) table on creation and probes results in subsequent *_join member functions.

This class enables the filtered hash join scheme that builds a hash table once from the right table, and probes as many times as needed (possibly in parallel) with different left tables. The right table acts as the filter to be applied on left tables in subsequent *_join operations. The underlying data structure is cuco::static_set.

For use cases where the left table should be reused with multiple right tables, use cudf::mark_join instead.

Note
All NaNs are considered as equal

Definition at line 59 of file filtered_join.hpp.

Constructor & Destructor Documentation

◆ filtered_join() [1/4]

cudf::filtered_join::filtered_join ( cudf::table_view const &  build,
cudf::null_equality  compare_nulls,
rmm::cuda_stream_view  stream 
)

Constructs a filtered hash join object for subsequent probe calls.

The build table is always treated as the right (filter) table. It will be applied to multiple left (probe) tables in subsequent semi_join or anti_join calls.

Parameters
buildThe right (filter) table used to build the hash table
compare_nullsControls whether null join-key values should match or not
streamCUDA stream used for device memory operations and kernel launches

◆ filtered_join() [2/4]

cudf::filtered_join::filtered_join ( cudf::table_view const &  build,
cudf::null_equality  compare_nulls,
double  load_factor,
rmm::cuda_stream_view  stream 
)

Constructs a filtered hash join object for subsequent probe calls.

The build table is always treated as the right (filter) table. It will be applied to multiple left (probe) tables in subsequent semi_join or anti_join calls.

Parameters
buildThe right (filter) table used to build the hash table
compare_nullsControls whether null join-key values should match or not
load_factorThe desired ratio of filled slots to total slots in the hash table, must be in range (0,1]. For example, 0.5 indicates a target of 50% occupancy. Note that the actual occupancy achieved may be slightly lower than the specified value.
streamCUDA stream used for device memory operations and kernel launches

◆ filtered_join() [3/4]

cudf::filtered_join::filtered_join ( cudf::table_view const &  build,
cudf::null_equality  compare_nulls,
set_as_build_table  reuse_tbl,
rmm::cuda_stream_view  stream 
)

Constructs a filtered hash join object for subsequent probe calls.

Deprecated:
Use the constructor without set_as_build_table instead.
Parameters
buildThe build table
compare_nullsControls whether null join-key values should match or not
reuse_tblSpecifies which table to use as the build table. Only RIGHT is supported.
streamCUDA stream used for device memory operations and kernel launches

◆ filtered_join() [4/4]

cudf::filtered_join::filtered_join ( cudf::table_view const &  build,
null_equality  compare_nulls,
set_as_build_table  reuse_tbl,
double  load_factor,
rmm::cuda_stream_view  stream 
)

Constructs a filtered hash join object for subsequent probe calls.

Deprecated:
Use the constructor without set_as_build_table instead.
Parameters
buildThe build table
compare_nullsControls whether null join-key values should match or not
reuse_tblSpecifies which table to use as the build table. Only RIGHT is supported.
load_factorThe desired ratio of filled slots to total slots in the hash table, must be in range (0,1]. For example, 0.5 indicates a target of 50% occupancy. Note that the actual occupancy achieved may be slightly lower than the specified value.
streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ anti_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::filtered_join::anti_join ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Returns a vector of row indices corresponding to an anti-join between the specified tables.

The returned vector contains the row indices from the probe (left) table for which there are no matching rows in the build (right/filter) table.

Build (right): {{1, 2, 3}}
Probe (left): {{0, 1, 2}}
Result: {0}
Parameters
probeThe probe (left) table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector left_indices that can be used to construct the result of performing a left anti join

◆ semi_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::filtered_join::semi_join ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Returns a vector of row indices corresponding to a semi-join between the specified tables.

The returned vector contains the row indices from the probe (left) table for which there is a matching row in the build (right/filter) table.

Build (right): {{1, 2, 3}}
Probe (left): {{0, 1, 2}}
Result: {1, 2}
Parameters
probeThe probe (left) table
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory
Returns
A vector left_indices that can be used to construct the result of performing a left semi join

The documentation for this class was generated from the following file: