nncf.quantization.advanced_parameters#

Structures and functions for passing advanced parameters to NNCF post-training quantization APIs.

Classes#

FP8Type

Defines FP8 special types (https://arxiv.org/pdf/2209.05433.pdf).

QuantizationParameters

Contains quantization parameters for weights or activations.

FP8QuantizationParameters

Contains convert parameters for weights or activations.

AdvancedBiasCorrectionParameters

Contains advanced parameters for fine-tuning bias correction algorithm.

AdvancedSmoothQuantParameters

Contains advanced alpha parameters for SmoothQuant algorithm.

AdvancedQuantizationParameters

Contains advanced parameters for fine-tuning quantization algorithm.

AdvancedAccuracyRestorerParameters

Contains advanced parameters for fine-tuning the accuracy restorer algorithm.

class nncf.quantization.advanced_parameters.FP8Type[source]#

Bases: nncf.parameters.StrEnum

Defines FP8 special types (https://arxiv.org/pdf/2209.05433.pdf).

Parameters:
  • E4M3 – Mode with 4-bit exponent and 3-bit mantissa.

  • E5M2 – Mode with 5-bit exponent and 2-bit mantissa.

class nncf.quantization.advanced_parameters.QuantizationParameters[source]#

Contains quantization parameters for weights or activations.

Parameters:
  • num_bits (Optional[int]) – The number of bits to use for quantization.

  • mode (nncf.common.quantization.structs.QuantizationMode) – The quantization mode to use, such as ‘symmetric’, ‘asymmetric’, etc.

  • signedness_to_force (Optional[bool]) – Whether to force the weights or activations to be signed (True), unsigned (False)

  • per_channel (Optional[bool]) – True if per-channel quantization is used, and False if per-tensor quantization is used.

  • narrow_range (Optional[bool]) –

    Whether to use a narrow quantization range.

    If False, then the input will be quantized into quantization range

    • [0; 2^num_bits - 1] for unsigned quantization and

    • [-2^(num_bits - 1); 2^(num_bits - 1) - 1] for signed quantization

    If True, then the ranges would be:

    • [0; 2^num_bits - 2] for unsigned quantization and

    • [-2^(num_bits - 1) + 1; 2^(num_bits - 1) - 1] for signed quantization

class nncf.quantization.advanced_parameters.FP8QuantizationParameters[source]#

Contains convert parameters for weights or activations.

Parameters:

destination_type (FP8Type) – Currently contains E4M3 or E5M2 for FP8 precision.

class nncf.quantization.advanced_parameters.AdvancedBiasCorrectionParameters[source]#

Contains advanced parameters for fine-tuning bias correction algorithm.

Parameters:
  • apply_for_all_nodes (bool) – Whether to apply the correction to all nodes in the model, or only to nodes that have a bias.

  • threshold (Optional[float]) – The threshold value determines the maximum bias correction value. The bias correction are skipped If the value is higher than threshold.

class nncf.quantization.advanced_parameters.AdvancedSmoothQuantParameters[source]#

Contains advanced alpha parameters for SmoothQuant algorithm. It regulates the calculation of the smooth scale for different node types. A negative value switches off the algorithm for current node type. In case of inaccurate results, this parameter may be adjusted in the range from 0 to 1 or set -1 to disable SmoothQuant algorithm.

Parameters:
  • convolution (float) – Whether to apply smoothing for Convolution layers.

  • matmul (float) – Whether to apply smoothing for MatMul layers.

class nncf.quantization.advanced_parameters.AdvancedQuantizationParameters[source]#

Contains advanced parameters for fine-tuning quantization algorithm.

Parameters:
  • overflow_fix (nncf.quantization.advanced_parameters.OverflowFix) – This option controls whether to apply the overflow issue fix for the 8-bit quantization.

  • quantize_outputs (bool) – Whether to insert additional quantizers right before each of the model outputs.

  • inplace_statistics (bool) – Defines whether to calculate quantizers statistics by backend graph operations or by default Python implementation, defaults to True.

  • disable_channel_alignment (bool) – Whether to disable the channel alignment.

  • disable_bias_correction (bool) – Whether to disable the bias correction.

  • batchwise_statistics (Optional[bool]) – Determines whether quantizer statistics should be calculated for each item of the batch or for the entire batch, default is None. “None” means that if torch.DataLoader or tensorflow.Dataset was passed as a data source for the calibration dataset, then in case batch_size of the data source > 1 batchwise_statistics sets to True, otherwise sets to False.

  • activations_quantization_params (nncf.quantization.advanced_parameters.QuantizationParameters) – Quantization parameters for activations.

  • weights_quantization_params (nncf.quantization.advanced_parameters.QuantizationParameters) – Quantization parameters for weights.

  • activations_range_estimator_params (nncf.quantization.range_estimator.RangeEstimatorParameters) – Range estimator parameters for activations.

  • weights_range_estimator_params (nncf.quantization.range_estimator.RangeEstimatorParameters) – Range estimator parameters for weights.

  • bias_correction_params (nncf.quantization.advanced_parameters.AdvancedBiasCorrectionParameters) – Advanced bias correction parameters.

  • smooth_quant_alphas – SmoothQuant-related parameters mapping. It regulates the calculation of the smooth scale. The default value stored in AdvancedSmoothQuantParameters. A negative value for each field switches off type smoothing. In case of inaccurate results, fields may be adjusted in the range from 0 to 1 or set -1 to disable smoothing for type.

  • smooth_quant_alpha (float) – Deprecated SmoothQuant-related parameter.

  • backend_params (Dict[str, Any]) – Backend-specific parameters.

class nncf.quantization.advanced_parameters.AdvancedAccuracyRestorerParameters[source]#

Contains advanced parameters for fine-tuning the accuracy restorer algorithm.

Parameters:
  • max_num_iterations (int) – The maximum number of iterations of the algorithm. In other words, the maximum number of layers that may be reverted back to floating-point precision. By default, it is limited by the overall number of quantized layers.

  • tune_hyperparams (int) – Whether to tune of quantization parameters as a preliminary step before reverting layers back to the floating-point precision. It can bring an additional boost in performance and accuracy, at the cost of increased overall quantization time. The default value is False.

  • ranking_subset_size (Optional[int]) – Size of a subset that is used to rank layers by their contribution to the accuracy drop.

  • num_ranking_workers (Optional[int]) – The number of parallel workers that are used to rank quantization operations.

  • intermediate_model_dir (Optional[str]) – Path to the folder where the model, which was fully quantized with initial parameters, should be saved.

  • restore_mode (RestoreMode) – Specifies how to revert operations to their original precision.