Semantic Search using Text Embedding
Convert and Optimize Model
Download and convert a text embedding model (e.g. BAAI/bge-small-en-v1.5) to OpenVINO format from Hugging Face:
optimum-cli export openvino --model BAAI/bge-small-en-v1.5 --trust-remote-code bge-small-en-v1_5_ov
See all supported Text Embedding Models.
Refer to the Model Preparation guide for detailed instructions on how to download, convert and optimize models for OpenVINO GenAI.
Run Model Using OpenVINO GenAI
TextEmbeddingPipeline
generates vector representations for text using embedding models.
- Python
- C++
- CPU
- GPU
import openvino_genai as ov_genai
pipeline = ov_genai.TextEmbeddingPipeline(
models_path,
"CPU",
pooling_type = ov_genai.TextEmbeddingPipeline.PoolingType.MEAN,
normalize = True
)
documents_embeddings = pipeline.embed_documents(documents)
query_embeddings = pipeline.embed_query("What is the capital of France?")
import openvino_genai as ov_genai
pipeline = ov_genai.TextEmbeddingPipeline(
models_path,
"GPU",
pooling_type = ov_genai.TextEmbeddingPipeline.PoolingType.MEAN,
normalize = True
)
documents_embeddings = pipeline.embed_documents(documents)
query_embeddings = pipeline.embed_query("What is the capital of France?")
- CPU
- GPU
#include "openvino/genai/rag/text_embedding_pipeline.hpp"
int main(int argc, char* argv[]) try {
auto documents = std::vector<std::string>(argv + 2, argv + argc);
std::string models_path = argv[1];
ov::genai::TextEmbeddingPipeline pipeline(
models_path,
"CPU",
ov::genai::pooling_type(ov::genai::TextEmbeddingPipeline::PoolingType::MEAN),
ov::genai::normalize(true)
);
ov::genai::EmbeddingResults documents_embeddings = pipeline.embed_documents(documents);
ov::genai::EmbeddingResult query_embedding = pipeline.embed_query("What is the capital of France?");
}
#include "openvino/genai/rag/text_embedding_pipeline.hpp"
int main(int argc, char* argv[]) try {
auto documents = std::vector<std::string>(argv + 2, argv + argc);
std::string models_path = argv[1];
ov::genai::TextEmbeddingPipeline pipeline(
models_path,
"GPU",
ov::genai::pooling_type(ov::genai::TextEmbeddingPipeline::PoolingType::MEAN),
ov::genai::normalize(true)
);
ov::genai::EmbeddingResults documents_embeddings = pipeline.embed_documents(documents);
ov::genai::EmbeddingResult query_embedding = pipeline.embed_query("What is the capital of France?");
}
Use CPU or GPU as devices without any other code change.
Additional Usage Options
Pooling Strategies
Text embedding models support different pooling strategies to aggregate token embeddings into a single vector:
CLS
: Use the first token embedding (default for many models)MEAN
: Average all token embeddings
You can set the pooling strategy via the pooling_type
parameter.
L2 Normalization
L2 normalization can be applied to the output embeddings for improved retrieval performance. Enable it with the normalize
parameter.
Query and Embed Instructions
Some models support special instructions for queries and documents. Use query_instruction
and embed_instruction
to provide these if needed.
Example: Custom Configuration
- Python
- C++
import openvino_genai as ov_genai
pipeline = ov_genai.TextEmbeddingPipeline(
models_path,
"CPU",
pooling_type=ov_genai.TextEmbeddingPipeline.PoolingType.MEAN,
normalize=True,
query_instruction="Represent this sentence for searching relevant passages: ",
embed_instruction="Represent this passage for retrieval: "
)
#include "openvino/genai/rag/text_embedding_pipeline.hpp"
ov::genai::TextEmbeddingPipeline pipeline(
models_path,
"CPU",
ov::genai::pooling_type(ov::genai::TextEmbeddingPipeline::PoolingType::MEAN),
ov::genai::normalize(true),
ov::genai::query_instruction("Represent this sentence for searching relevant passages: "),
ov::genai::embed_instruction("Represent this passage for retrieval: ")
);
For the full list of configuration options, see the TextEmbeddingPipeline API Reference.