JS vlm_chat_sample that supports VLM models
This example showcases inference of text-generation Vision Language Models (VLMs): miniCPM-V-2_6 and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample features openvino-genai-node.VLMPipeline and configures it for the chat scenario.
There is one sample file:
visual_language_chat.jsdemonstrates basic usage of the VLM pipeline which supports accelerated inference using prompt lookup decoding.
Install JS dependencies
Install Node.js dependencies from samples/js:
cd samples/js
npm install
Download and convert the model and tokenizers
The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.
Install ../../export-requirements.txt to convert a model.
pip install --upgrade-strategy eager -r ../export-requirements.txt
Then, run the export with Optimum CLI:
optimum-cli export openvino --model openbmb/MiniCPM-V-2_6 --trust-remote-code MiniCPM-V-2_6
Run image-to-text chat sample
This image can be used as a sample image.
cd samples/js
node visual_language_chat/visual_language_chat.js ./miniCPM-V-2_6/ 319483352-d5fbbd1a-d484-415c-88cb-9986625b7b11.jpg
See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.