Hash join that builds hash table in creation and probes results in subsequent *_join
member functions.
More...
#include <hash_join.hpp>
Public Types | |
using | impl_type = typename cudf::detail::hash_join< cudf::hashing::detail::MurmurHash3_x86_32< cudf::hash_value_type > > |
Implementation type. | |
Hash join that builds hash table in creation and probes results in subsequent *_join
member functions.
This class enables the hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).
Definition at line 75 of file hash_join.hpp.
cudf::hash_join::hash_join | ( | cudf::table_view const & | build, |
null_equality | compare_nulls, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Construct a hash join object for subsequent probe calls.
hash_join
object must not outlive the table viewed by build
, else behavior is undefined.std::invalid_argument | if the build table has no columns |
build | The build table, from which the hash table is built |
compare_nulls | Controls whether null join-key values should match or not |
stream | CUDA stream used for device memory operations and kernel launches |
cudf::hash_join::hash_join | ( | cudf::table_view const & | build, |
nullable_join | has_nulls, | ||
null_equality | compare_nulls, | ||
double | load_factor, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Construct a hash join object for subsequent probe calls.
hash_join
object must not outlive the table viewed by build
, else behavior is undefined.std::invalid_argument | if the build table has no columns |
build | The build table, from which the hash table is built |
compare_nulls | Controls whether null join-key values should match or not |
stream | CUDA stream used for device memory operations and kernel launches |
std::invalid_argument | if load_factor is not greater than 0 and less than or equal to 1 |
has_nulls | Flag to indicate if there exists any nulls in the build table or any probe table that will be used later for join |
load_factor | The hash table occupancy ratio in (0,1]. A value of 0.5 means 50% desired occupancy. |
std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::full_join | ( | cudf::table_view const & | probe, |
std::optional< std::size_t > | output_size = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns the row indices that can be used to construct the result of performing a full join between two tables.
output_size
is smaller than the actual output size.std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table, from which the tuples are probed |
output_size | Optional value which allows users to specify the exact output size |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
left_indices
, right_indices
] that can be used to construct the result of performing a full join between two tables with build
and probe
as the join keys . cudf::join_match_context cudf::hash_join::full_join_match_context | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns context information about matches between the probe and build tables.
This method computes, for each row in the probe table, how many matching rows exist in the build table according to full join semantics, and returns the number of matches through a join_match_context object.
For full join, this includes matches for probe table rows, and the result may need to be combined with unmatched rows from the build table to get the complete picture.
std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table to join with the pre-processed build table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the result device memory |
std::size_t cudf::hash_join::full_join_size | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns the exact number of matches (rows) when performing a full join with the specified probe table.
std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table, from which the tuples are probed |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the intermediate table and columns' device memory. |
build
and probe
as the join keys . std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::inner_join | ( | cudf::table_view const & | probe, |
std::optional< std::size_t > | output_size = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns the row indices that can be used to construct the result of performing an inner join between two tables.
output_size
is smaller than the actual output size.std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table, from which the tuples are probed |
output_size | Optional value which allows users to specify the exact output size |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
left_indices
, right_indices
] that can be used to construct the result of performing an inner join between two tables with build
and probe
as the join keys . cudf::join_match_context cudf::hash_join::inner_join_match_context | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns context information about matches between the probe and build tables.
This method computes, for each row in the probe table, how many matching rows exist in the build table according to inner join semantics, and returns the number of matches through a join_match_context object.
This is particularly useful for:
std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table to join with the pre-processed build table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the result device memory |
std::size_t cudf::hash_join::inner_join_size | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) | const |
Returns the exact number of matches (rows) when performing an inner join with the specified probe table.
std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table, from which the tuples are probed |
stream | CUDA stream used for device memory operations and kernel launches |
build
and probe
as the join keys . std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::left_join | ( | cudf::table_view const & | probe, |
std::optional< std::size_t > | output_size = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns the row indices that can be used to construct the result of performing a left join between two tables.
output_size
is smaller than the actual output size.std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table, from which the tuples are probed |
output_size | Optional value which allows users to specify the exact output size |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
left_indices
, right_indices
] that can be used to construct the result of performing a left join between two tables with build
and probe
as the join keys. cudf::join_match_context cudf::hash_join::left_join_match_context | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) | const |
Returns context information about matches between the probe and build tables.
This method computes, for each row in the probe table, how many matching rows exist in the build table according to left join semantics, and returns the number of matches through a join_match_context object.
For left join, every row in the probe table will have at least one match (either with a matching row from the build table or with a null placeholder).
std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table to join with the pre-processed build table |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the result device memory |
std::size_t cudf::hash_join::left_join_size | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) | const |
Returns the exact number of matches (rows) when performing a left join with the specified probe table.
std::invalid_argument | If the input probe table has nulls while this hash_join object was not constructed with null check. |
probe | The probe table, from which the tuples are probed |
stream | CUDA stream used for device memory operations and kernel launches |
build
and probe
as the join keys .