std::unique_ptr< cudf::column > wordpiece_tokenize(cudf::strings_column_view const &input, wordpiece_vocabulary const &vocabulary, cudf::size_type max_words_per_row=0, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
Returns the token ids for the input string a wordpiece tokenizer algorithm with the given vocabulary.
std::unique_ptr< wordpiece_vocabulary > load_wordpiece_vocabulary(cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
Create a tokenize_vocabulary object from a strings column.
Vocabulary object to be used with nvtext::wordpiece_tokenizer.
wordpiece_vocabulary(cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
Vocabulary object constructor.