Tokenization

OpenVINO GenAI provides a way to tokenize and detokenize text using the ov::genai::Tokenizer class. The Tokenizer is a high level abstraction over the OpenVINO Tokenizers library.

It can be initialized from the path, in-memory IR representation or obtained from the ov::genai::LLMPipeline object.

Python
C++

import openvino_genai as ov_genai

# Initialize from the path
tokenizer = ov_genai.Tokenizer(models_path)

# Or get tokenizer instance from LLMPipeline
pipe = ov_genai.LLMPipeline(models_path, "CPU")
tokenizer = pipe.get_tokenizer()

#include "openvino/genai/llm_pipeline.hpp"

// Initialize from the path
auto tokenizer = ov::genai::Tokenizer(models_path);

// Or get tokenizer instance from LLMPipeline
auto pipe = ov::genai::LLMPipeline pipe(models_path, "CPU");
auto tokenzier = pipe.get_tokenizer();

Tokenizer has encode() and decode() methods which support the following arguments: add_special_tokens, skip_special_tokens, pad_to_max_length, max_length.

Example - disable adding special tokens:

Python
C++

tokens = tokenizer.encode("The Sun is yellow because", add_special_tokens=False)

auto tokens = tokenizer.encode("The Sun is yellow because", ov::genai::add_special_tokens(false));

The encode() method returns a TokenizedInputs object containing input_ids and attention_mask, both stored as ov::Tensor. Since ov::Tensor requires fixed-length sequences, padding is applied to match the longest sequence in a batch, ensuring a uniform shape. Also resulting sequence is truncated by max_length. If this value is not defined by used, it's is taken from the IR.

Both padding and max_length can be controlled by the user. If pad_to_max_length is set to true, then instead of padding to the longest sequence it will be padded to the max_length.

Example - control padding:

Python
C++

import openvino_genai as ov_genai

tokenizer = ov_genai.Tokenizer(models_path)
prompts = ["The Sun is yellow because", "The"]

# Since prompt is defenitely shorter than maximal length (which is taken from IR) will not affect shape.
# Resulting shape is defined by length of the longest tokens sequence.
# Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="longest", truncation=True)
tokens = tokenizer.encode(["The Sun is yellow because", "The"])
# or is equivalent to
tokens = tokenizer.encode(["The Sun is yellow because", "The"], pad_to_max_length=False)
print(tokens.input_ids.shape)
# out_shape: [2, 6]

# Resulting tokens tensor will be padded to 1024, sequences which exceed this length will be truncated.
# Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="max_length", truncation=True, max_length=1024)
tokens = tokenizer.encode(["The Sun is yellow because",
                          "The"
                          "The longest string ever" * 2000], pad_to_max_length=True, max_length=1024)
print(tokens.input_ids.shape)
# out_shape: [3, 1024]

# For single string prompts truncation and padding are also applied.
tokens = tokenizer.encode("The Sun is yellow because", pad_to_max_length=True, max_length=128)
print(tokens.input_ids.shape)
# out_shape: [1, 128]

#include "openvino/genai/llm_pipeline.hpp"

auto tokenizer = ov::genai::Tokenizer(models_path);
std::vector<std::string> prompts = {"The Sun is yellow because", "The"};

// Since prompt is defenitely shorter than maximal length (which is taken from IR) will not affect shape.
// Resulting shape is defined by length of the longest tokens sequence.
// Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="longest", truncation=True)
tokens = tokenizer.encode({"The Sun is yellow because", "The"})
// or is equivalent to
tokens = tokenizer.encode({"The Sun is yellow because", "The"}, ov::genai::pad_to_max_length(False))
// out_shape: [2, 6]

// Resulting tokens tensor will be padded to 1024.
// Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="max_length", truncation=True, max_length=1024)
tokens = tokenizer.encode({"The Sun is yellow because",
                          "The",
                          std::string(2000, 'n')}, ov::genai::pad_to_max_length(True), ov::genai::max_length(1024))
// out_shape: [3, 1024]

// For single string prompts truncation and padding are also applied.
tokens = tokenizer.encode({"The Sun is yellow because"}, ov::genai::pad_to_max_length(True), ov::genai::max_length(1024))
// out_shape: [1, 128]

Example - disable adding special tokens:​

Example - control padding:​

Example - disable adding special tokens:

Example - control padding: