nncf.experimental.torch.fx
#
Classes#
Implementation of the Torch AO quantizer which annotates models with quantization annotations |
Functions#
|
Applies post-training quantization to the torch.fx.GraphModule provided model |
- nncf.experimental.torch.fx.quantize_pt2e(model, quantizer, calibration_dataset, subset_size=300, fast_bias_correction=True, smooth_quant=False, bias_correction_params=None, smooth_quant_params=None, activations_range_estimator_params=None, weights_range_estimator_params=None, batchwise_statistics=None, fold_quantize=True, do_copy=False)[source]#
Applies post-training quantization to the torch.fx.GraphModule provided model using provided torch.ao quantizer.
- Parameters:
model (torch.fx.GraphModule) – A torch.fx.GraphModule instance to be quantized.
quantizer (torch.ao.quantization.quantizer.Quantizer) – Torch ao quantizer to annotate nodes in the graph with quantization setups to convey the desired way of quantization.
calibration_dataset (nncf.Dataset) – A representative dataset for the calibration process.
subset_size (int) – Size of a subset to calculate activations statistics used for quantization.
fast_bias_correction (Optional[bool]) – Setting this option to False enables a different bias correction method which is more accurate, in general, and takes more time but requires less memory. None disables the bias correction algorithm.
smooth_quant (bool) – Setting this option to True enables the SmoothQuant algorithm.
bias_correction_params (Optional[nncf.quantization.advanced_parameters.AdvancedBiasCorrectionParameters]) – Contains advanced parameters for fine-tuning bias correction algorithm.
smooth_quant_params (Optional[nncf.quantization.advanced_parameters.AdvancedSmoothQuantParameters]) – Contains advanced alpha parameters for SmoothQuant algorithm.
activations_range_estimator_params (Optional[nncf.quantization.range_estimator.RangeEstimatorParameters]) – Contains parameters for estimating the range of activations of the model.
weights_range_estimator_params (Optional[nncf.quantization.range_estimator.RangeEstimatorParameters]) – Contains parameters for estimating the range of weights of the model.
batchwise_statistics (Optional[bool]) – Determines whether quantizer statistics should be calculated for each item of the batch or for the entire batch, default is None, which means it set True if batch_size > 1 otherwise False.
fold_quantize (bool) – Boolean flag for whether fold the quantize op or not. The value is True by default.
do_copy (bool) – The copy of the given model is being quantized if do_copy == True, otherwise the model is quantized inplace. Default value is False.
- Returns:
The quantized torch.fx.GraphModule instance.
- Return type:
torch.fx.GraphModule
- class nncf.experimental.torch.fx.OpenVINOQuantizer(*, mode=None, preset=None, target_device=TargetDevice.ANY, model_type=None, ignored_scope=None, overflow_fix=None, quantize_outputs=False, activations_quantization_params=None, weights_quantization_params=None, quantizer_propagation_rule=QuantizerPropagationRule.MERGE_ALL_IN_ONE)[source]#
Bases:
torch.ao.quantization.quantizer.quantizer.Quantizer
Implementation of the Torch AO quantizer which annotates models with quantization annotations optimally for the inference via OpenVINO.
- Parameters:
mode (Optional[nncf.QuantizationMode]) – Defines optimization mode for the algorithm. None by default.
preset (Optional[nncf.QuantizationPreset]) – A preset controls the quantization mode (symmetric and asymmetric). It can take the following values: - performance: Symmetric quantization of weights and activations. - mixed: Symmetric quantization of weights and asymmetric quantization of activations. Default value is None. In this case, mixed preset is used for transformer model type otherwise performance.
target_device (nncf.TargetDevice) – A target device the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device, defaults to TargetDevice.ANY.
model_type (Optional[nncf.ModelType]) – Model type is needed to specify additional patterns in the model. Supported only transformer now.
ignored_scope (Optional[nncf.IgnoredScope]) – An ignored scope that defined the list of model control flow graph nodes to be ignored during quantization.
overflow_fix (Optional[nncf.OverflowFix]) – This option controls whether to apply the overflow issue fix for the 8-bit quantization.
quantize_outputs (bool) – Whether to insert additional quantizers right before each of the model outputs.
activations_quantization_params (Optional[Union[nncf.quantization.advanced_parameters.QuantizationParameters, nncf.quantization.advanced_parameters.FP8QuantizationParameters]]) – Quantization parameters for model activations.
weights_quantization_params (Optional[Union[nncf.quantization.advanced_parameters.QuantizationParameters, nncf.quantization.advanced_parameters.FP8QuantizationParameters]]) – Quantization parameters for model weights.
quantizer_propagation_rule (nncf.common.quantization.quantizer_propagation.structs.QuantizerPropagationRule) – The strategy to be used while propagating and merging quantizers. MERGE_ALL_IN_ONE by default.
- set_ignored_scope(names=None, patterns=None, types=None, subgraphs=None, validate=True)[source]#
Provides an option to specify portions of model to be excluded from compression. The ignored scope defines model sub-graphs that should be excluded from the quantization process.
- Parameters:
names (Optional[list[str]]) – List of ignored node names.
patterns (Optional[list[str]]) – List of regular expressions that define patterns for names of ignored nodes.
types (Optional[list[str]]) – List of ignored operation types.
subgraphs (Optional[list[tuple[list[str], list[str]]]]) – List of ignored subgraphs.
validate (bool) – If set to True, then a RuntimeError will be raised if any ignored scope does not match in the model graph.
- Return type:
None
- annotate(model)[source]#
Adds quantization annotations to the nodes in the model graph in-place.
- Parameters:
model (torch.fx.GraphModule) – A torch.fx.GraphModule to annotate.
- Returns:
The torch.fx.GraphModule with updated annotations.
- Return type:
torch.fx.GraphModule
- validate(model)[source]#
Validates the annotated model before the insertion of FakeQuantizers / observers.
- Parameters:
model (torch.fx.GraphModule) – Annotated torch.fx.GraphModule to validate after the annotation.
- Return type:
None
- transform_for_annotation(model)[source]#
Allows for user defined transforms to run before annotating the graph. This allows quantizer to allow quantizing part of the model that are otherwise not quantizable. For example quantizer can a) decompose a compound operator like scaled dot product attention, into bmm and softmax if quantizer knows how to quantize bmm/softmax but not sdpa or b) transform scalars to tensor to allow quantizing scalars.
Note: this is an optional method
- Parameters:
model (torch.fx.GraphModule) – Given torch.fx.GraphModule to transform before the annotation.
- Returns:
The transformed torch.fx.GraphModule ready for the annotation.
- Return type:
torch.fx.GraphModule