Skip to main content

Whisper Automatic Speech Recognition C Sample

Table of Contents

  1. Download OpenVINO GenAI
  2. Build Samples
  3. Download and Convert the Model
  4. Prepare Audio File
  5. Sample Description
  6. Troubleshooting
  7. Support and Contribution

Download OpenVINO GenAI

Download and extract OpenVINO GenAI Archive Visit the OpenVINO Download Page.

Build Samples

Set up the environment and build the samples Linux and macOS:

source <INSTALL_DIR>/setupvars.sh
./<INSTALL_DIR>/samples/c/build_samples.sh

Windows Command Prompt:

<INSTALL_DIR>\setupvars.bat
<INSTALL_DIR>\samples\c\build_samples_msvc.bat

Windows PowerShell:

.<INSTALL_DIR>\setupvars.ps1
.<INSTALL_DIR>\samples\c\build_samples.ps1

Download and Convert the Model

The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.

Install ../../export-requirements.txt if model conversion is required.

pip install --upgrade-strategy eager -r ../../export-requirements.txt
optimum-cli export openvino --trust-remote-code --model openai/whisper-tiny whisper-tiny

If a converted model in OpenVINO IR format is available in the OpenVINO optimized models collection on Hugging Face, you can download it directly via huggingface-cli.

For example:

pip install huggingface-hub
huggingface-cli download OpenVINO/whisper-tiny-int8-ov --local-dir whisper-tiny-int8-ov

Prepare audio file

Prepare audio file in wav format with sampling rate 16k Hz.

You can download example audio file: https://storage.openvinotoolkit.org/models_contrib/speech/2021.2/librispeech_s5/how_are_you_doing_today.wav

Sample Description

This example showcases inference of speech recognition Whisper Models using the OpenVINO GenAI C API. The sample features ov_genai_whisper_pipeline and uses audio files in WAV format as input.

Run Command

./whisper_speech_recognition_c <MODEL_DIR> "<WAV_FILE_PATH>" [DEVICE]

Parameters

  • MODEL_DIR: Path to the converted Whisper model directory
  • WAV_FILE_PATH: Path to the WAV audio file (use quotes if path contains spaces)
  • DEVICE: Optional - device to run inference on (default: "CPU")

Example Usage

./whisper_speech_recognition_c whisper-tiny how_are_you_doing_today.wav

Expected Output

 How are you doing today?
timestamps: [0.00, 2.00] text: How are you doing today?

The sample will:

  1. Load the WAV audio file and validate its format
  2. Automatically resample to 16kHz if needed
  3. Perform speech-to-text transcription
  4. Output the full transcription
  5. Display word-level timestamps for each text chunk

Troubleshooting

Empty or Incorrect Output

If you get empty or incorrect transcription results:

  • Ensure your audio file is in WAV format
  • Check that the audio contains clear speech

Model Loading Errors

If the model fails to load:

  • Verify the model path exists and contains valid Whisper model files
  • Ensure the model was properly converted to OpenVINO IR format
  • Check that the specified device (CPU, GPU, etc.) is available on your system

Audio File Errors

The sample provides detailed error messages for common audio file issues:

  • File not found
  • Permission denied
  • Invalid WAV format
  • Unsupported audio encoding (only PCM is supported)
  • Multi-channel audio (only mono is supported)

Support and Contribution