Vocabulary object to be used with nvtext::tokenize_with_vocabulary. More...
#include <tokenize.hpp>
Public Member Functions | |
tokenize_vocabulary (cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) | |
Vocabulary object constructor. More... | |
Vocabulary object to be used with nvtext::tokenize_with_vocabulary.
Use nvtext::load_vocabulary to create this object.
Definition at line 238 of file tokenize.hpp.
nvtext::tokenize_vocabulary::tokenize_vocabulary | ( | cudf::strings_column_view const & | input, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
) |
Vocabulary object constructor.
Token ids are the row indices within the vocabulary column. Each vocabulary entry is expected to be unique otherwise the behavior is undefined.
cudf::logic_error | if vocabulary contains nulls or is empty |
input | Strings for the vocabulary |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |