Result object for the subword_tokenize functions. More...
#include <subword_tokenize.hpp>
Public Attributes | |
uint32_t | nrows_tensor {} |
The number of rows for the output token-ids. | |
uint32_t | sequence_length {} |
The number of token-ids in each row. | |
std::unique_ptr< cudf::column > | tensor_token_ids |
A vector of token-ids for each row. More... | |
std::unique_ptr< cudf::column > | tensor_attention_mask |
This mask identifies which tensor-token-ids are valid. More... | |
std::unique_ptr< cudf::column > | tensor_metadata |
The metadata for each tensor row. More... | |
Result object for the subword_tokenize functions.
Definition at line 75 of file subword_tokenize.hpp.
std::unique_ptr<cudf::column> nvtext::tokenizer_result::tensor_attention_mask |
This mask identifies which tensor-token-ids are valid.
This column is of type UINT32 with no null entries.
Definition at line 96 of file subword_tokenize.hpp.
std::unique_ptr<cudf::column> nvtext::tokenizer_result::tensor_metadata |
The metadata for each tensor row.
There are three elements per tensor row [row-id, start_pos, stop_pos]) This column is of type UINT32 with no null entries.
Definition at line 103 of file subword_tokenize.hpp.
std::unique_ptr<cudf::column> nvtext::tokenizer_result::tensor_token_ids |
A vector of token-ids for each row.
The data is a flat matrix (nrows_tensor x sequence_length) of token-ids. This column is of type UINT32 with no null entries.
Definition at line 90 of file subword_tokenize.hpp.