Public Member Functions
	tokenize_vocabulary (cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	Vocabulary object constructor. More...

Detailed Description

Vocabulary object to be used with nvtext::tokenize_with_vocabulary.

Use nvtext::load_vocabulary to create this object.

Definition at line 228 of file tokenize.hpp.

Constructor & Destructor Documentation

nvtext::tokenize_vocabulary::tokenize_vocabulary	(	cudf::strings_column_view const &	input,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	mr = `cudf::get_current_device_resource_ref()`
	)

Vocabulary object constructor.

Token ids are the row indices within the vocabulary column. Each vocabulary entry is expected to be unique otherwise the behavior is undefined.

Exceptions

cudf::logic_error if vocabulary contains nulls or is empty

Parameters

input	Strings for the vocabulary
stream	CUDA stream used for device memory operations and kernel launches
mr	Device memory resource used to allocate the returned column's device memory

The documentation for this struct was generated from the following file: