Skip to main content

Supported Models

info

Other models with similar architectures may also work successfully even if not explicitly validated. Consider testing any unlisted models to verify compatibility with your specific use case.

Large Language Models (LLMs)

ArchitectureModelsExample HuggingFace Models
AquilaModelAquila
ArcticForCausalLMSnowflake
BaichuanForCausalLMBaichuan2
BloomForCausalLMBloom
Bloomz
ChatGLMModelChatGLM
CodeGenForCausalLMCodeGen
CohereForCausalLMAya
C4AI Command R
DbrxForCausalLMDBRX
DeciLMForCausalLMDeciLM
DeepseekForCausalLMDeepSeek-MoE
DeepseekV2ForCausalLMDeepSeekV2
DeepseekV3ForCausalLMDeepSeekV3
ExaoneForCausalLMExaone
FalconForCausalLMFalcon
GemmaForCausalLMGemma
Gemma2ForCausalLMGemma2
Gemma3ForCausalLMGemma3
GlmForCausalLMGLM
GPT2LMHeadModelGPT2
CodeParrot
GPTBigCodeForCausalLMStarCoder
GPTJForCausalLMGPT-J
GPTNeoForCausalLMGPT Neo
GPTNeoXForCausalLMGPT NeoX
Dolly
RedPajama
GPTNeoXJapaneseForCausalLMGPT NeoX Japanese
GraniteForCausalLMGranite
GraniteMoeForCausalLMGraniteMoE
InternLMForCausalLMInternLM
InternLM2ForCausalLMInternLM2
JAISLMHeadModelJais
LlamaForCausalLMLlama 3
Llama 2
Falcon3
OpenLLaMA
TinyLlama
MPTForCausalLMMPT
MiniCPMForCausalLMMiniCPM
MiniCPM3ForCausalLMMiniCPM3
MistralForCausalLMMistral
Notus
Zephyr
Neural Chat
MixtralForCausalLMMixtral
OlmoForCausalLMOLMo
OPTForCausalLMOPT
OrionForCausalLMOrion
PhiForCausalLMPhi
Phi3ForCausalLMPhi3
QWenLMHeadModelQwen
Qwen2ForCausalLMQwen2
Qwen2MoeForCausalLMQwen2MoE
StableLmForCausalLMStableLM
Starcoder2ForCausalLMStartcoder2
XGLMForCausalLMXGLM
XverseForCausalLMXverse
info

LoRA adapters are supported.

info

The pipeline can work with other similar topologies produced by optimum-intel with the same model signature. The model is required to have the following inputs after the conversion:

  1. input_ids contains the tokens.
  2. attention_mask is filled with 1.
  3. beam_idx selects beams.
  4. position_ids (optional) encodes a position of currently generating token in the sequence and a single logits output.
note

Models should belong to the same family and have the same tokenizers.

Image Generation Models

ArchitectureText to ImageImage to ImageInpaintingLoRA SupportExample HuggingFace Models
Latent Consistency Model
Stable Diffusion
Stable Diffusion Inpainting
Stable Diffusion XL
Stable Diffusion XL Inpainting
Stable Diffusion 3
Flux

Visual Language Models (VLMs)

ArchitectureModelsLoRA SupportExample HuggingFace Models
InternVL2InternVL2 (Notes)
LLaVALLaVA-v1.5
LLaVA-NeXTLLaVA-v1.6
MiniCPMVMiniCPM-V-2_6
Phi3VForCausalLMphi3_v (Notes)
Qwen2-VLQwen2-VL
VLM Models Notes

InternVL2

To convert InternVL2 models, timm and einops are required:

pip install timm einops

phi3_v

  • GPU isn't supported
  • Example models' configs aren't consistent. It's required to override the default eos_token_id with the one from a tokenizer:
    generation_config.set_eos_token_id(pipe.get_tokenizer().get_eos_token_id())

Speech Recognition Models (Whisper-based)

ArchitectureModelsLoRA SupportExample HuggingFace Models
WhisperForConditionalGenerationWhisper
Distil-Whisper
info

Some models may require access request submission on the Hugging Face page to be downloaded.

If https://huggingface.co/ is down, the conversion step won't be able to download the models.