Hash join that builds hash table in creation and probes results in subsequent *_join
member functions.
More...
#include <join.hpp>
Public Types | |
enum | common_columns_output_side { common_columns_output_side::PROBE, common_columns_output_side::BUILD } |
Controls where common columns will be output for a inner join. More... | |
Public Member Functions | |
hash_join (hash_join const &)=delete | |
hash_join (hash_join &&)=delete | |
hash_join & | operator= (hash_join const &)=delete |
hash_join & | operator= (hash_join &&)=delete |
hash_join (cudf::table_view const &build, std::vector< size_type > const &build_on, null_equality compare_nulls, rmm::cuda_stream_view stream=rmm::cuda_stream_default) | |
Construct a hash join object for subsequent probe calls. More... | |
std::pair< std::unique_ptr< cudf::table >, std::unique_ptr< cudf::table > > | inner_join (cudf::table_view const &probe, std::vector< size_type > const &probe_on, std::vector< std::pair< cudf::size_type, cudf::size_type >> const &columns_in_common, common_columns_output_side common_columns_output_side=common_columns_output_side::PROBE, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const |
Performs an inner join by probing in the internal hash table. More... | |
std::unique_ptr< cudf::table > | left_join (cudf::table_view const &probe, std::vector< size_type > const &probe_on, std::vector< std::pair< cudf::size_type, cudf::size_type >> const &columns_in_common, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const |
Performs a left join by probing in the internal hash table. More... | |
std::unique_ptr< cudf::table > | full_join (cudf::table_view const &probe, std::vector< size_type > const &probe_on, std::vector< std::pair< cudf::size_type, cudf::size_type >> const &columns_in_common, null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=rmm::cuda_stream_default, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const |
Performs a full join by probing in the internal hash table. More... | |
Hash join that builds hash table in creation and probes results in subsequent *_join
member functions.
This class enables the hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).
|
strong |
cudf::hash_join::hash_join | ( | cudf::table_view const & | build, |
std::vector< size_type > const & | build_on, | ||
null_equality | compare_nulls, | ||
rmm::cuda_stream_view | stream = rmm::cuda_stream_default |
||
) |
Construct a hash join object for subsequent probe calls.
hash_join
object must not outlive the table viewed by build
, else behavior is undefined.build | The build table, from which the hash table is built. |
build_on | The column indices from build to join on. |
compare_nulls | Controls whether null join-key values should match or not. |
stream | CUDA stream used for device memory operations and kernel launches |
std::unique_ptr<cudf::table> cudf::hash_join::full_join | ( | cudf::table_view const & | probe, |
std::vector< size_type > const & | probe_on, | ||
std::vector< std::pair< cudf::size_type, cudf::size_type >> const & | columns_in_common, | ||
null_equality | compare_nulls = null_equality::EQUAL , |
||
rmm::cuda_stream_view | stream = rmm::cuda_stream_default , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Performs a full join by probing in the internal hash table.
More details please
probe | The probe table, from which the tuples are probed. |
probe_on | The column indices from probe to join on. |
columns_in_common | is a vector of pairs of column indices into probe and build , respectively, that are "in common". For "common" columns, only a single output column will be produced, which is gathered from probe_on columns. Else, for every column in probe_on and build_on , an output column will be produced. For each of these pairs (P, B), P should exist in probe_on and B should exist in build_on . |
compare_nulls | Controls whether null join-key values should match or not. |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
stream | CUDA stream used for device memory operations and kernel launches |
build
and probe
tables on the columns specified by build_on
and probe_on
. The resulting table will be joined columns of probe(including common columns)+build(excluding common columns)
. std::pair<std::unique_ptr<cudf::table>, std::unique_ptr<cudf::table> > cudf::hash_join::inner_join | ( | cudf::table_view const & | probe, |
std::vector< size_type > const & | probe_on, | ||
std::vector< std::pair< cudf::size_type, cudf::size_type >> const & | columns_in_common, | ||
common_columns_output_side | common_columns_output_side = common_columns_output_side::PROBE , |
||
null_equality | compare_nulls = null_equality::EQUAL , |
||
rmm::cuda_stream_view | stream = rmm::cuda_stream_default , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Performs an inner join by probing in the internal hash table.
Given that it is sometimes desired to choose the small table to be the build
side for an inner join,a (probe
, build
) table pair, which contains the probe and build portions of the logical joined table respectively, is returned so that caller can freely rearrange them to restore the logical left
right
order. This introduces some extra logic about where "common" columns should go, i.e. the legacy cudf::inner_join()
API always outputs "common" columns in the left
portion and the corresponding columns in the right
portion are omitted. To better align with the legacy cudf::inner_join()
API, a common_columns_output_side
parameter is introduced to specify whether "common" columns should go in probe
or build
portion.
More details please
probe | The probe table, from which the tuples are probed. |
probe_on | The column indices from probe to join on. |
columns_in_common | is a vector of pairs of column indices into probe and build , respectively, that are "in common". For "common" columns, only a single output column will be produced, which is gathered from probe_on columns or build_on columns if probe_output_side is LEFT or RIGHT. Else, for every column in probe_on and build_on , an output column will be produced. For each of these pairs (P, B), P should exist in probe_on and B should exist in build_on . |
common_columns_output_side |
common_columns_output_side
. compare_nulls | Controls whether null join-key values should match or not. |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
stream | CUDA stream used for device memory operations and kernel launches |
probe
, build
) of joining both tables on the columns specified by probe_on
and build_on
. The resulting table pair will be joined columns of (probe(including common columns)
, build(excluding common columns)
) if common_columns_output_side
is PROBE
, or (probe(excluding common columns)
, build(including common columns)
) if common_columns_output_side
is BUILD
. std::unique_ptr<cudf::table> cudf::hash_join::left_join | ( | cudf::table_view const & | probe, |
std::vector< size_type > const & | probe_on, | ||
std::vector< std::pair< cudf::size_type, cudf::size_type >> const & | columns_in_common, | ||
null_equality | compare_nulls = null_equality::EQUAL , |
||
rmm::cuda_stream_view | stream = rmm::cuda_stream_default , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Performs a left join by probing in the internal hash table.
More details please
probe | The probe table, from which the tuples are probed. |
probe_on | The column indices from probe to join on. |
columns_in_common | is a vector of pairs of column indices into probe and build , respectively, that are "in common". For "common" columns, only a single output column will be produced, which is gathered from probe_on columns. Else, for every column in probe_on and build_on , an output column will be produced. For each of these pairs (P, B), P should exist in probe_on and B should exist in build_on . |
compare_nulls | Controls whether null join-key values should match or not. |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
stream | CUDA stream used for device memory operations and kernel launches |
build
and probe
tables on the columns specified by build_on
and probe_on
. The resulting table will be joined columns of probe(including common columns)+build(excluding common columns)
.