Vocabulary object to be used with nvtext::tokenize_with_vocabulary. More...
#include <tokenize.hpp>
Public Member Functions | |
| tokenize_vocabulary (cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) | |
| Vocabulary object constructor. More... | |
Vocabulary object to be used with nvtext::tokenize_with_vocabulary.
Use nvtext::load_vocabulary to create this object.
Definition at line 238 of file tokenize.hpp.
| nvtext::tokenize_vocabulary::tokenize_vocabulary | ( | cudf::strings_column_view const & | input, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) |
Vocabulary object constructor.
Token ids are the row indices within the vocabulary column. Each vocabulary entry is expected to be unique otherwise the behavior is undefined.
| cudf::logic_error | if vocabulary contains nulls or is empty |
| input | Strings for the vocabulary |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |