Files | Functions

Files

file  search.hpp
 Column APIs for lower_bound, upper_bound, and contains.
 

Functions

std::unique_ptr< columncudf::lower_bound (table_view const &haystack, table_view const &needles, std::vector< order > const &column_order, std::vector< null_order > const &null_precedence, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Find smallest indices in a sorted table where values should be inserted to maintain order. More...
 
std::unique_ptr< columncudf::upper_bound (table_view const &haystack, table_view const &needles, std::vector< order > const &column_order, std::vector< null_order > const &null_precedence, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Find largest indices in a sorted table where values should be inserted to maintain order. More...
 
bool cudf::contains (column_view const &haystack, scalar const &needle, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Check if the given needle value exists in the haystack column. More...
 
std::unique_ptr< columncudf::contains (column_view const &haystack, column_view const &needles, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
 Check if the given needles values exists in the haystack column. More...
 

Detailed Description

Function Documentation

◆ contains() [1/2]

std::unique_ptr<column> cudf::contains ( column_view const &  haystack,
column_view const &  needles,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Check if the given needles values exists in the haystack column.

The new column will have type BOOL and have the same size and null mask as the input needles column. That is, any null row in the needles column will result in a nul row in the output column.

Exceptions
cudf::logic_errorIf haystack.type() != needles.type()
haystack = { 10, 20, 30, 40, 50 }
needles = { 20, 40, 60, 80 }
result = { true, true, false, false }
Parameters
haystackThe column containing search space
needlesA column of values to check for existence in the search space
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A BOOL column indicating if each element in needles exists in the search space

◆ contains() [2/2]

bool cudf::contains ( column_view const &  haystack,
scalar const &  needle,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Check if the given needle value exists in the haystack column.

Exceptions
cudf::logic_errorIf haystack.type() != needle.type().
Single Column:
idx 0 1 2 3 4
haystack = { 10, 20, 20, 30, 50 }
needle = { 20 }
result = true
Parameters
haystackThe column containing search space
needleA scalar value to check for existence in the search space
streamCUDA stream used for device memory operations and kernel launches
Returns
true if the given needle value exists in the haystack column

◆ lower_bound()

std::unique_ptr<column> cudf::lower_bound ( table_view const &  haystack,
table_view const &  needles,
std::vector< order > const &  column_order,
std::vector< null_order > const &  null_precedence,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Find smallest indices in a sorted table where values should be inserted to maintain order.

For each row in needles, find the first index in haystack where inserting the row still maintains its sort order.

Example:
Single column:
idx 0 1 2 3 4
haystack = { 10, 20, 20, 30, 50 }
needles = { 20 }
result = { 1 }
Multi Column:
idx 0 1 2 3 4
haystack = {{ 10, 20, 20, 20, 20 },
{ 5.0, .5, .5, .7, .7 },
{ 90, 77, 78, 61, 61 }}
needles = {{ 20 },
{ .7 },
{ 61 }}
result = { 3 }
Parameters
haystackThe table containing search space
needlesValues for which to find the insert locations in the search space
column_orderVector of column sort order
null_precedenceVector of null_precedence enums needles
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A non-nullable column of elements containing the insertion points

◆ upper_bound()

std::unique_ptr<column> cudf::upper_bound ( table_view const &  haystack,
table_view const &  needles,
std::vector< order > const &  column_order,
std::vector< null_order > const &  null_precedence,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
)

Find largest indices in a sorted table where values should be inserted to maintain order.

For each row in needles, find the last index in haystack where inserting the row still maintains its sort order.

Example:
Single Column:
idx 0 1 2 3 4
haystack = { 10, 20, 20, 30, 50 }
needles = { 20 }
result = { 3 }
Multi Column:
idx 0 1 2 3 4
haystack = {{ 10, 20, 20, 20, 20 },
{ 5.0, .5, .5, .7, .7 },
{ 90, 77, 78, 61, 61 }}
needles = {{ 20 },
{ .7 },
{ 61 }}
result = { 5 }
Parameters
haystackThe table containing search space
needlesValues for which to find the insert locations in the search space
column_orderVector of column sort order
null_precedenceVector of null_precedence enums needles
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A non-nullable column of elements containing the insertion points