Hash join that builds hash table in creation and probes results in subsequent *_join
member functions.
More...
#include <join.hpp>
Public Types | |
using | impl_type = typename cudf::detail::hash_join< cudf::hashing::detail::MurmurHash3_x86_32< cudf::hash_value_type > > |
Implementation type. | |
Hash join that builds hash table in creation and probes results in subsequent *_join
member functions.
This class enables the hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).
cudf::hash_join::hash_join | ( | cudf::table_view const & | build, |
null_equality | compare_nulls, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Construct a hash join object for subsequent probe calls.
hash_join
object must not outlive the table viewed by build
, else behavior is undefined.build | The build table, from which the hash table is built |
compare_nulls | Controls whether null join-key values should match or not |
stream | CUDA stream used for device memory operations and kernel launches |
cudf::hash_join::hash_join | ( | cudf::table_view const & | build, |
nullable_join | has_nulls, | ||
null_equality | compare_nulls, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Construct a hash join object for subsequent probe calls.
hash_join
object must not outlive the table viewed by build
, else behavior is undefined.build | The build table, from which the hash table is built |
compare_nulls | Controls whether null join-key values should match or not |
stream | CUDA stream used for device memory operations and kernel launches |
has_nulls | Flag to indicate if there exists any nulls in the build table or any probe table that will be used later for join |
std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::full_join | ( | cudf::table_view const & | probe, |
std::optional< std::size_t > | output_size = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Returns the row indices that can be used to construct the result of performing a full join between two tables.
output_size
is smaller than the actual output size.probe | The probe table, from which the tuples are probed |
output_size | Optional value which allows users to specify the exact output size |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
cudf::logic_error | If the input probe table has nulls while this hash_join object was not constructed with null check. |
left_indices
, right_indices
] that can be used to construct the result of performing a full join between two tables with build
and probe
as the join keys . std::size_t cudf::hash_join::full_join_size | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Returns the exact number of matches (rows) when performing a full join with the specified probe table.
probe | The probe table, from which the tuples are probed |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the intermediate table and columns' device memory. |
cudf::logic_error | If the input probe table has nulls while this hash_join object was not constructed with null check. |
build
and probe
as the join keys . std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::inner_join | ( | cudf::table_view const & | probe, |
std::optional< std::size_t > | output_size = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Returns the row indices that can be used to construct the result of performing an inner join between two tables.
output_size
is smaller than the actual output size.probe | The probe table, from which the tuples are probed |
output_size | Optional value which allows users to specify the exact output size |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
cudf::logic_error | If the input probe table has nulls while this hash_join object was not constructed with null check. |
left_indices
, right_indices
] that can be used to construct the result of performing an inner join between two tables with build
and probe
as the join keys . std::size_t cudf::hash_join::inner_join_size | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) | const |
Returns the exact number of matches (rows) when performing an inner join with the specified probe table.
probe | The probe table, from which the tuples are probed |
stream | CUDA stream used for device memory operations and kernel launches |
cudf::logic_error | If the input probe table has nulls while this hash_join object was not constructed with null check. |
build
and probe
as the join keys . std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::left_join | ( | cudf::table_view const & | probe, |
std::optional< std::size_t > | output_size = {} , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = rmm::mr::get_current_device_resource() |
||
) | const |
Returns the row indices that can be used to construct the result of performing a left join between two tables.
output_size
is smaller than the actual output size.probe | The probe table, from which the tuples are probed |
output_size | Optional value which allows users to specify the exact output size |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table and columns' device memory. |
cudf::logic_error | If the input probe table has nulls while this hash_join object was not constructed with null check. |
left_indices
, right_indices
] that can be used to construct the result of performing a left join between two tables with build
and probe
as the join keys. std::size_t cudf::hash_join::left_join_size | ( | cudf::table_view const & | probe, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) | const |
Returns the exact number of matches (rows) when performing a left join with the specified probe table.
probe | The probe table, from which the tuples are probed |
stream | CUDA stream used for device memory operations and kernel launches |
cudf::logic_error | If the input probe table has nulls while this hash_join object was not constructed with null check. |
build
and probe
as the join keys .