Skip to main content

Image Processing Using VLMs

Convert and Optimize Model

Download and convert model (e.g. openbmb/MiniCPM-V-2_6) to OpenVINO format from Hugging Face:

optimum-cli export --model openbmb/MiniCPM-V-2_6 --weight-format int4 --trust-remote-code MiniCPM_V_2_6_ov

See all supported Visual Language Models.

info

Refer to the Model Preparation guide for detailed instructions on how to download, convert and optimize models for OpenVINO GenAI.

Run Model Using OpenVINO GenAI

OpenVINO GenAI introduces the VLMPipeline pipeline for inference of multimodal text-generation Vision Language Models (VLMs). It can generate text from a text prompt and images as inputs.

import openvino_genai as ov_genai
import openvino as ov
from PIL import Image
import numpy as np
from pathlib import Path

def read_image(path: str) -> ov.Tensor:
pic = Image.open(path).convert("RGB")
image_data = np.array(pic)[None]
return ov.Tensor(image_data)

def read_images(path: str) -> list[ov.Tensor]:
entry = Path(path)
if entry.is_dir():
return [read_image(str(file)) for file in sorted(entry.iterdir())]
return [read_image(path)]

images = read_images("./images")

pipe = ov_genai.VLMPipeline(model_path, "CPU")
result = pipe.generate(prompt, images=images, max_new_tokens=100)
print(result.texts[0])
tip

Use CPU or GPU as devices without any other code change.

Additional Usage Options

tip

Check out Python and C++ visual language chat samples.

Use Different Generation Parameters

Similar to text generation, VLM pipelines support various generation parameters to control the text output.

Generation Configuration Workflow

  1. Get the model default config with get_generation_config()
  2. Modify parameters
  3. Apply the updated config using one of the following methods:
    • Use set_generation_config(config)
    • Pass config directly to generate() (e.g. generate(prompt, config))
    • Specify options as inputs in the generate() method (e.g. generate(prompt, max_new_tokens=100))

Basic Generation Configuration

import openvino_genai as ov_genai

pipe = ov_genai.VLMPipeline(model_path, "CPU")

# Get default configuration
config = pipe.get_generation_config()

# Modify parameters
config.max_new_tokens = 100
config.temperature = 0.7
config.top_k = 50
config.top_p = 0.9
config.repetition_penalty = 1.2

# Generate text with custom configuration
output = pipe.generate(prompt, images, config)
Understanding Basic Generation Parameters
  • max_new_tokens: The maximum numbers of tokens to generate, excluding the number of tokens in the prompt. max_new_tokens has priority over max_length.
  • temperature: Controls the level of creativity in AI-generated text:
    • Low temperature (e.g. 0.2) leads to more focused and deterministic output, choosing tokens with the highest probability.
    • Medium temperature (e.g. 1.0) maintains a balance between creativity and focus, selecting tokens based on their probabilities without significant bias.
    • High temperature (e.g. 2.0) makes output more creative and adventurous, increasing the chances of selecting less likely tokens.
  • top_k: Limits token selection to the k most likely next tokens. Higher values allow more diverse outputs.
  • top_p: Selects from the smallest set of tokens whose cumulative probability exceeds p. Helps balance diversity and quality.
  • repetition_penalty: Reduces the likelihood of repeating tokens. Values above 1.0 discourage repetition.

For the full list of generation parameters, refer to the Generation Config API.

Using OpenVINO GenAI in Chat Scenario

Refer to the Chat Scenario guide for more information on using OpenVINO GenAI in chat applications.

Streaming the Output

Refer to the Streaming guide for more information on streaming the output with OpenVINO GenAI.