Public Types | Public Member Functions | List of all members
cudf::hash_join Class Reference

Hash join that builds hash table in creation and probes results in subsequent *_join member functions. More...

#include <join.hpp>

Public Types

enum  common_columns_output_side { common_columns_output_side::PROBE, common_columns_output_side::BUILD }
 Controls where common columns will be output for a inner join. More...
 

Public Member Functions

 hash_join (hash_join const &)=delete
 
 hash_join (hash_join &&)=delete
 
hash_joinoperator= (hash_join const &)=delete
 
hash_joinoperator= (hash_join &&)=delete
 
 hash_join (cudf::table_view const &build, std::vector< size_type > const &build_on, null_equality compare_nulls, rmm::cuda_stream_view stream=rmm::cuda_stream_default)
 Construct a hash join object for subsequent probe calls. More...
 
std::pair< std::unique_ptr< cudf::table >, std::unique_ptr< cudf::table > > inner_join (cudf::table_view const &probe, std::vector< size_type > const &probe_on, std::vector< std::pair< cudf::size_type, cudf::size_type >> const &columns_in_common, common_columns_output_side common_columns_output_side=common_columns_output_side::PROBE, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 Performs an inner join by probing in the internal hash table. More...
 
std::unique_ptr< cudf::tableleft_join (cudf::table_view const &probe, std::vector< size_type > const &probe_on, std::vector< std::pair< cudf::size_type, cudf::size_type >> const &columns_in_common, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 Performs a left join by probing in the internal hash table. More...
 
std::unique_ptr< cudf::tablefull_join (cudf::table_view const &probe, std::vector< size_type > const &probe_on, std::vector< std::pair< cudf::size_type, cudf::size_type >> const &columns_in_common, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 Performs a full join by probing in the internal hash table. More...
 

Detailed Description

Hash join that builds hash table in creation and probes results in subsequent *_join member functions.

This class enables the hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).

Definition at line 380 of file join.hpp.

Member Enumeration Documentation

◆ common_columns_output_side

Controls where common columns will be output for a inner join.

Enumerator
PROBE 

Common columns is output in the probe portion of the table pair returned by inner_join.

BUILD 

Common columns is output in the build portion of the table pair returned by inner_join.

Definition at line 408 of file join.hpp.

Constructor & Destructor Documentation

◆ hash_join()

cudf::hash_join::hash_join ( cudf::table_view const &  build,
std::vector< size_type > const &  build_on,
null_equality  compare_nulls,
rmm::cuda_stream_view  stream = rmm::cuda_stream_default 
)

Construct a hash join object for subsequent probe calls.

Note
The hash_join object must not outlive the table viewed by build, else behavior is undefined.
Parameters
buildThe build table, from which the hash table is built.
build_onThe column indices from build to join on.
compare_nullsControls whether null join-key values should match or not.
streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ full_join()

std::unique_ptr<cudf::table> cudf::hash_join::full_join ( cudf::table_view const &  probe,
std::vector< size_type > const &  probe_on,
std::vector< std::pair< cudf::size_type, cudf::size_type >> const &  columns_in_common,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
) const

Performs a full join by probing in the internal hash table.

More details please

See also
cudf::full_join().
Parameters
probeThe probe table, from which the tuples are probed.
probe_onThe column indices from probe to join on.
columns_in_commonis a vector of pairs of column indices into probe and build, respectively, that are "in common". For "common" columns, only a single output column will be produced, which is gathered from probe_on columns. Else, for every column in probe_on and build_on, an output column will be produced. For each of these pairs (P, B), P should exist in probe_on and B should exist in build_on.
compare_nullsControls whether null join-key values should match or not.
mrDevice memory resource used to allocate the returned table and columns' device memory.
streamCUDA stream used for device memory operations and kernel launches
Returns
Result of joining build and probe tables on the columns specified by build_on and probe_on. The resulting table will be joined columns of probe(including common columns)+build(excluding common columns).

◆ inner_join()

std::pair<std::unique_ptr<cudf::table>, std::unique_ptr<cudf::table> > cudf::hash_join::inner_join ( cudf::table_view const &  probe,
std::vector< size_type > const &  probe_on,
std::vector< std::pair< cudf::size_type, cudf::size_type >> const &  columns_in_common,
common_columns_output_side  common_columns_output_side = common_columns_output_side::PROBE,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
) const

Performs an inner join by probing in the internal hash table.

Given that it is sometimes desired to choose the small table to be the build side for an inner join,a (probe, build) table pair, which contains the probe and build portions of the logical joined table respectively, is returned so that caller can freely rearrange them to restore the logical left right order. This introduces some extra logic about where "common" columns should go, i.e. the legacy cudf::inner_join() API always outputs "common" columns in the left portion and the corresponding columns in the right portion are omitted. To better align with the legacy cudf::inner_join() API, a common_columns_output_side parameter is introduced to specify whether "common" columns should go in probe or build portion.

More details please

See also
cudf::inner_join().
Parameters
probeThe probe table, from which the tuples are probed.
probe_onThe column indices from probe to join on.
columns_in_commonis a vector of pairs of column indices into probe and build, respectively, that are "in common". For "common" columns, only a single output column will be produced, which is gathered from probe_on columns or build_on columns if probe_output_side is LEFT or RIGHT. Else, for every column in probe_on and build_on, an output column will be produced. For each of these pairs (P, B), P should exist in probe_on and B should exist in build_on.
common_columns_output_side
See also
common_columns_output_side.
Parameters
compare_nullsControls whether null join-key values should match or not.
mrDevice memory resource used to allocate the returned table and columns' device memory.
streamCUDA stream used for device memory operations and kernel launches
Returns
Table pair of (probe, build) of joining both tables on the columns specified by probe_on and build_on. The resulting table pair will be joined columns of (probe(including common columns), build(excluding common columns)) if common_columns_output_side is PROBE, or (probe(excluding common columns), build(including common columns)) if common_columns_output_side is BUILD.

◆ left_join()

std::unique_ptr<cudf::table> cudf::hash_join::left_join ( cudf::table_view const &  probe,
std::vector< size_type > const &  probe_on,
std::vector< std::pair< cudf::size_type, cudf::size_type >> const &  columns_in_common,
null_equality  compare_nulls = null_equality::EQUAL,
rmm::cuda_stream_view  stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
) const

Performs a left join by probing in the internal hash table.

More details please

See also
cudf::left_join().
Parameters
probeThe probe table, from which the tuples are probed.
probe_onThe column indices from probe to join on.
columns_in_commonis a vector of pairs of column indices into probe and build, respectively, that are "in common". For "common" columns, only a single output column will be produced, which is gathered from probe_on columns. Else, for every column in probe_on and build_on, an output column will be produced. For each of these pairs (P, B), P should exist in probe_on and B should exist in build_on.
compare_nullsControls whether null join-key values should match or not.
mrDevice memory resource used to allocate the returned table and columns' device memory.
streamCUDA stream used for device memory operations and kernel launches
Returns
Result of joining build and probe tables on the columns specified by build_on and probe_on. The resulting table will be joined columns of probe(including common columns)+build(excluding common columns).

The documentation for this class was generated from the following file: