Models Optimization#

OpenVINO™ Training Extensions provides two types of optimization algorithms: Post-Training Quantization tool (PTQ) and Neural Network Compression Framework (NNCF).

Post-Training Quantization Tool#

PTQ is designed to optimize the inference of models by applying post-training methods that do not require model retraining or fine-tuning. If you want to know more details about how PTQ works and to be more familiar with model optimization methods, please refer to documentation.

To run Post-training quantization it is required to convert the model to OpenVINO™ intermediate representation (IR) first. To perform fast and accurate quantization we use DefaultQuantization Algorithm for each task. Please, refer to the Tune quantization Parameters for further information about configuring the optimization.

PTQ parameters can be found and configured in template.yaml and configuration.yaml for each task. For Anomaly and Semantic Segmentation tasks, we have separate configuration files for PTQ, that can be found in the same directory with template.yaml, for example for PaDiM, OCR-Lite-HRNe-18-mod2 model.

Neural Network Compression Framework#

NNCF utilizes Training-time Optimization algorithms. It is a set of advanced algorithms for training-time model optimization within the Deep Learning framework such as Pytorch. The process of optimization is controlled by the NNCF configuration file. A JSON configuration file is used for easier setup of the parameters of the compression algorithm. See configuration file description.

You can refer to configuration files for default templates for each task accordingly: Classification, Object Detection, Semantic segmentation, Instance segmentation, Anomaly classification, Anomaly Detection, Anomaly segmentation. Configs for other templates can be found in the same directory.

NNCF tends to provide better quality in terms of preserving accuracy as it uses training compression approaches. Compression results achievable with the NNCF can be found here . Meanwhile, the PTQ is faster but can degrade accuracy more than the training-enabled approach.

Note

The main recommendation is to start with post-training compression and use NNCF compression during training if you are not satisfied with the results.

Please, refer to our dedicated tutorials on how to optimize your model using PTQ or NNCF.