OpenVINO GenAI Video Generation Python Samples
These samples showcase the use of OpenVINO's inference capabilities for video generation tasks. The sample features openvino_genai.Text2VideoPipeline for generating videos from text prompts using models like LTX-Video.
The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU.
text2video.pydemonstrates basic text to video generation.taylorseer_text2video.pydemonstrates text to video generation with TaylorSeer caching optimization for improved performance. LTX-Video model is supported only.
Table of Contents
Download and Convert the Model
The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.
Install ../../export-requirements.txt if model conversion is required.
pip install --upgrade-strategy eager -r ../../export-requirements.txt
Then, run the export with Optimum CLI:
optimum-cli export openvino --model Lightricks/LTX-Video --task text-to-video --weight-format fp32 ltx_video_ov/FP32
Note: For basic video generation without LoRA,
--weight-format int8produces a smaller model.
Alternatively, do it in Python code:
from optimum.intel.openvino import OVLTXPipeline
pipeline = OVLTXPipeline.from_pretrained("Lightricks/LTX-Video", export=True, compile=False)
pipeline.save_pretrained("ltx_video_ov/FP32")
Sample Descriptions
Common Information
Follow Get Started with Samples to get common information about OpenVINO samples. Follow build instruction to build GenAI samples.
GPUs usually provide better performance compared to CPUs. Modify the source code to change the device for inference to the GPU.
Install ../../deployment-requirements.txt to run samples:
pip install --upgrade-strategy eager -r ../../deployment-requirements.txt
Text to Video Sample (text2video.py)
-
Description: Basic video generation using a text-to-video model. This sample demonstrates how to generate videos from text prompts using the OpenVINO GenAI Text2VideoPipeline. The LTX-Video model is recommended for this sample.
Recommended models: Lightricks/LTX-Video
-
Main Feature: Generate videos from text descriptions with customizable parameters.
-
Run Command:
python text2video.py model_dir prompt [--device DEVICE] [--output OUTPUT]Example:
python text2video.py ./ltx_video_ov/FP32 "A woman with long brown hair and light skin smiles at another woman with long blonde hair"
LoRA Text to Video Sample (lora_text2video.py)
-
Description: Video generation with LoRA adapters using a text-to-video model. This sample demonstrates how to generate videos from text prompts while applying a LoRA adapter.
Recommended models: Lightricks/LTX-Video
To download the LoRA adapter used in the example below:
huggingface-cli download svjack/ltx_video_pixel_early_lora ltx_pixel_pytorch_lora_weights.safetensors -
Main Feature: Apply a LoRA adapter to a text-to-video pipeline for customized generation.
-
Run Command:
python lora_text2video.py model_dir prompt [lora_adapter_path alpha] ...Example:
python lora_text2video.py ./ltx_video_ov/FP32 "In the style of Pixel, the video shifts to a majestic castle under a starry sky." ltx_pixel_pytorch_lora_weights.safetensors 3.0
The sample will generate two video files, lora_video.avi and baseline_video.avi, in the current directory.
Users can modify the source code to experiment with different generation parameters:
- Change width or height of generated video
- Change number of frames
- Generate multiple videos per prompt
- Adjust number of inference steps
- Play with guidance scale (improves quality when > 1)
- Add negative prompt when guidance scale > 1
- Adjust frame rate
Run with threaded callback
You can also implement a callback function that runs in a separate thread. This allows for parallel processing, enabling you to interrupt generation early if intermediate results are satisfactory or to add logs.
Please find the template of the callback usage below:
pipe = openvino_genai.Text2VideoPipeline(model_dir, device)
def callback(step, num_steps, latent):
print(f"Video generation step: {step + 1} / {num_steps}")
if your_condition: # return True if you want to interrupt video generation
return True
return False
video = pipe.generate(
prompt,
callback=callback
).video
TaylorSeer Text to Video Sample (taylorseer_text2video.py)
-
Description: Generate videos with TaylorSeer caching optimization. This sample runs two generations: one baseline without caching and one with TaylorSeer caching enabled, then compares their performance.
-
Run Command:
python taylorseer_text2video.py model_dir promptExample:
python taylorseer_text2video.py ./ltx_video_ov/INT8 "a robot dancing in the rain"
The sample will generate two video files: taylorseer_baseline.avi (without caching) and taylorseer.avi (with caching), and display a performance comparison showing the speedup achieved.
The TaylorSeer configuration parameters can be adjusted in the source code:
cache_interval: Number of steps between cache updates (default: 3)disable_cache_before_step: Disable caching before this step for warmup (default: 6)disable_cache_after_step: Disable caching after this step (default: -2, meaning 2 steps before the end)
For more details about TaylorSeer, see the diffusion caching documentation.
Troubleshooting
LTX-Video Model Constraints
[!NOTE] The LTX-Video model works best on:
- Resolutions divisible by 32 (e.g., 480x704, 512x512, 720x1280)
- Number of frames divisible by 8 + 1 (e.g., 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121, 161, 257)
- At least 2 inference steps (1 step may produce artifacts)
- Best quality achieved with resolutions under 720x1280 and number of frames below 257
OpenCV Installation
If you encounter issues with OpenCV when running the samples, ensure it's properly installed:
pip install opencv-python==4.12.0.88
This dependency is included in ../../deployment-requirements.txt.
Unicode characters encoding error on Windows
Example error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u25aa' in position 0: character maps to <undefined>
If you encounter the error described in the example when sample is printing output to the Windows console, it is likely due to the default Windows encoding not supporting certain Unicode characters. To resolve this:
- Enable Unicode characters for Windows cmd - open
Regionsettings fromControl panel.Administrative->Change system locale->Beta: Use Unicode UTF-8 for worldwide language support->OK. Reboot. - Enable UTF-8 mode by setting environment variable
PYTHONIOENCODING="utf8".
Support and Contribution
- For troubleshooting, consult the OpenVINO documentation.
- To report issues or contribute, visit the GitHub repository.