Hash join that builds a hash table with the right table on construction and probes results in subsequent *_join member functions.
More...
#include <hash_join.hpp>
Public Types | |
| using | impl_type = typename cudf::detail::hash_join< cudf::hashing::detail::MurmurHash3_x86_32< cudf::hash_value_type > > |
| Implementation type. | |
Hash join that builds a hash table with the right table on construction and probes results in subsequent *_join member functions.
This class enables the hash join scheme that builds with the right table once and probes with many left tables (possibly in parallel).
Definition at line 64 of file hash_join.hpp.
| cudf::hash_join::hash_join | ( | cudf::table_view const & | right, |
| null_equality | compare_nulls, | ||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Construct a hash join object for subsequent probe calls.
hash_join object must not outlive the table viewed by right, else behavior is undefined.| std::invalid_argument | if the right table has no columns |
| right | The right table, from which the hash table is built |
| compare_nulls | Controls whether null join-key values should match or not |
| stream | CUDA stream used for device memory operations and kernel launches |
| cudf::hash_join::hash_join | ( | cudf::table_view const & | right, |
| nullable_join | has_nulls, | ||
| null_equality | compare_nulls, | ||
| double | load_factor, | ||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Construct a hash join object for subsequent probe calls.
hash_join object must not outlive the table viewed by right, else behavior is undefined.| std::invalid_argument | if the right table has no columns |
| right | The right table, from which the hash table is built |
| compare_nulls | Controls whether null join-key values should match or not |
| stream | CUDA stream used for device memory operations and kernel launches |
| std::invalid_argument | if load_factor is not greater than 0 and less than or equal to 1 |
| has_nulls | Flag to indicate if there exists any nulls in the right table or any left table that will be used later for join |
| load_factor | The hash table occupancy ratio in (0,1]. A value of 0.5 means 50% desired occupancy. |
| std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::full_join | ( | cudf::table_view const & | left, |
| std::optional< std::size_t > | output_size = {}, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns the row indices that can be used to construct the result of performing a full join between two tables.
output_size is smaller than the actual output size.| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table, from which the tuples are probed |
| output_size | Optional value which allows users to specify the exact output size |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned table and columns' device memory. |
left_indices, right_indices] that can be used to construct the result of performing a full join between two tables with left and right as the join keys . | cudf::join_match_context cudf::hash_join::full_join_match_context | ( | cudf::table_view const & | left, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns context information about matches between the left and right tables.
This method computes, for each row in the left table, how many matching rows exist in the right table according to full join semantics, and returns the number of matches through a join_match_context object.
For full join, this includes matches for left table rows, and the result may need to be combined with unmatched rows from the right table to get the complete picture.
| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table to join with the pre-processed right table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the result device memory |
| std::size_t cudf::hash_join::full_join_size | ( | cudf::table_view const & | left, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns the exact number of matches (rows) when performing a full join with the specified left table.
| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table, from which the tuples are probed |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the intermediate table and columns' device memory. |
left and right as the join keys . | std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::inner_join | ( | cudf::table_view const & | left, |
| std::optional< std::size_t > | output_size = {}, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns the row indices that can be used to construct the result of performing an inner join between two tables.
output_size is smaller than the actual output size.| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table, from which the tuples are probed |
| output_size | Optional value which allows users to specify the exact output size |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned table and columns' device memory. |
left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left and right as the join keys . | cudf::join_match_context cudf::hash_join::inner_join_match_context | ( | cudf::table_view const & | left, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns context information about matches between the left and right tables.
This method computes, for each row in the left table, how many matching rows exist in the right table according to inner join semantics, and returns the number of matches through a join_match_context object.
This is particularly useful for:
| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table to join with the pre-processed right table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the result device memory |
| std::size_t cudf::hash_join::inner_join_size | ( | cudf::table_view const & | left, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) | const |
Returns the exact number of matches (rows) when performing an inner join with the specified left table.
| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table, from which the tuples are probed |
| stream | CUDA stream used for device memory operations and kernel launches |
left and right as the join keys . | std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::left_join | ( | cudf::table_view const & | left, |
| std::optional< std::size_t > | output_size = {}, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns the row indices that can be used to construct the result of performing a left join between two tables.
output_size is smaller than the actual output size.| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table, from which the tuples are probed |
| output_size | Optional value which allows users to specify the exact output size |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned table and columns' device memory. |
left_indices, right_indices] that can be used to construct the result of performing a left join between two tables with left and right as the join keys. | cudf::join_match_context cudf::hash_join::left_join_match_context | ( | cudf::table_view const & | left, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns context information about matches between the left and right tables.
This method computes, for each row in the left table, how many matching rows exist in the right table according to left join semantics, and returns the number of matches through a join_match_context object.
For left join, every row in the left table will have at least one match (either with a matching row from the right table or with a null placeholder).
| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table to join with the pre-processed right table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the result device memory |
| std::size_t cudf::hash_join::left_join_size | ( | cudf::table_view const & | left, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) | const |
Returns the exact number of matches (rows) when performing a left join with the specified left table.
| std::invalid_argument | If the input left table has nulls while this hash_join object was not constructed with null check. |
| left | The left table, from which the tuples are probed |
| stream | CUDA stream used for device memory operations and kernel launches |
left and right as the join keys .