nncf.experimental.torch.fx#

Classes#

OpenVINOQuantizer

Implementation of the Torch AO quantizer which annotates models with quantization annotations

Functions#

quantize_pt2e(model, quantizer, calibration_dataset[, ...])

Applies post-training quantization to the torch.fx.GraphModule provided model

nncf.experimental.torch.fx.quantize_pt2e(model, quantizer, calibration_dataset, subset_size=300, fast_bias_correction=True, smooth_quant=False, bias_correction_params=None, smooth_quant_params=None, activations_range_estimator_params=None, weights_range_estimator_params=None, batchwise_statistics=None, fold_quantize=True, do_copy=False)[source]#

Applies post-training quantization to the torch.fx.GraphModule provided model using provided torch.ao quantizer.

Parameters:
  • model (torch.fx.GraphModule) – A torch.fx.GraphModule instance to be quantized.

  • quantizer (torch.ao.quantization.quantizer.Quantizer) – Torch ao quantizer to annotate nodes in the graph with quantization setups to convey the desired way of quantization.

  • calibration_dataset (nncf.Dataset) – A representative dataset for the calibration process.

  • subset_size (int) – Size of a subset to calculate activations statistics used for quantization.

  • fast_bias_correction (Optional[bool]) – Setting this option to False enables a different bias correction method which is more accurate, in general, and takes more time but requires less memory. None disables the bias correction algorithm.

  • smooth_quant (bool) – Setting this option to True enables the SmoothQuant algorithm.

  • bias_correction_params (Optional[nncf.quantization.advanced_parameters.AdvancedBiasCorrectionParameters]) – Contains advanced parameters for fine-tuning bias correction algorithm.

  • smooth_quant_params (Optional[nncf.quantization.advanced_parameters.AdvancedSmoothQuantParameters]) – Contains advanced alpha parameters for SmoothQuant algorithm.

  • activations_range_estimator_params (Optional[nncf.quantization.range_estimator.RangeEstimatorParameters]) – Contains parameters for estimating the range of activations of the model.

  • weights_range_estimator_params (Optional[nncf.quantization.range_estimator.RangeEstimatorParameters]) – Contains parameters for estimating the range of weights of the model.

  • batchwise_statistics (Optional[bool]) – Determines whether quantizer statistics should be calculated for each item of the batch or for the entire batch, default is None, which means it set True if batch_size > 1 otherwise False.

  • fold_quantize (bool) – Boolean flag for whether fold the quantize op or not. The value is True by default.

  • do_copy (bool) – The copy of the given model is being quantized if do_copy == True, otherwise the model is quantized inplace. Default value is False.

Returns:

The quantized torch.fx.GraphModule instance.

Return type:

torch.fx.GraphModule

class nncf.experimental.torch.fx.OpenVINOQuantizer(*, mode=None, preset=None, target_device=TargetDevice.ANY, model_type=None, ignored_scope=None, overflow_fix=None, quantize_outputs=False, activations_quantization_params=None, weights_quantization_params=None, quantizer_propagation_rule=QuantizerPropagationRule.MERGE_ALL_IN_ONE)[source]#

Bases: torch.ao.quantization.quantizer.quantizer.Quantizer

Implementation of the Torch AO quantizer which annotates models with quantization annotations optimally for the inference via OpenVINO.

Parameters:
set_ignored_scope(names=None, patterns=None, types=None, subgraphs=None, validate=True)[source]#

Provides an option to specify portions of model to be excluded from compression. The ignored scope defines model sub-graphs that should be excluded from the quantization process.

Parameters:
  • names (Optional[list[str]]) – List of ignored node names.

  • patterns (Optional[list[str]]) – List of regular expressions that define patterns for names of ignored nodes.

  • types (Optional[list[str]]) – List of ignored operation types.

  • subgraphs (Optional[list[tuple[list[str], list[str]]]]) – List of ignored subgraphs.

  • validate (bool) – If set to True, then a RuntimeError will be raised if any ignored scope does not match in the model graph.

Return type:

None

annotate(model)[source]#

Adds quantization annotations to the nodes in the model graph in-place.

Parameters:

model (torch.fx.GraphModule) – A torch.fx.GraphModule to annotate.

Returns:

The torch.fx.GraphModule with updated annotations.

Return type:

torch.fx.GraphModule

validate(model)[source]#

Validates the annotated model before the insertion of FakeQuantizers / observers.

Parameters:

model (torch.fx.GraphModule) – Annotated torch.fx.GraphModule to validate after the annotation.

Return type:

None

transform_for_annotation(model)[source]#

Allows for user defined transforms to run before annotating the graph. This allows quantizer to allow quantizing part of the model that are otherwise not quantizable. For example quantizer can a) decompose a compound operator like scaled dot product attention, into bmm and softmax if quantizer knows how to quantize bmm/softmax but not sdpa or b) transform scalars to tensor to allow quantizing scalars.

Note: this is an optional method

Parameters:

model (torch.fx.GraphModule) – Given torch.fx.GraphModule to transform before the annotation.

Returns:

The transformed torch.fx.GraphModule ready for the annotation.

Return type:

torch.fx.GraphModule