nncf.quantization.advanced_parameters
#
Structures and functions for passing advanced parameters to NNCF post-training quantization APIs.
Classes#
Defines FP8 special types (https://arxiv.org/pdf/2209.05433.pdf). |
|
Specifies how to handle nodes that do not support the given group size. |
|
Contains quantization parameters for weights or activations. |
|
Contains convert parameters for weights or activations. |
|
Contains advanced parameters for fine-tuning bias correction algorithm. |
|
Contains advanced alpha parameters for SmoothQuant algorithm. |
|
Contains advanced parameters for fine-tuning quantization algorithm. |
|
Contains advanced parameters for AWQ algorithm. |
|
Contains advanced parameters for scale estimation algorithm. |
|
Contains advanced parameters for GPTQ algorithm. |
|
Contains advanced parameters for lora correction algorithm. |
|
Contains advanced parameters for compression algorithms. |
|
Contains advanced parameters for fine-tuning the accuracy restorer algorithm. |
- class nncf.quantization.advanced_parameters.FP8Type[source]#
Bases:
nncf.parameters.StrEnum
Defines FP8 special types (https://arxiv.org/pdf/2209.05433.pdf).
- Parameters:
E4M3 – Mode with 4-bit exponent and 3-bit mantissa.
E5M2 – Mode with 5-bit exponent and 2-bit mantissa.
- class nncf.quantization.advanced_parameters.GroupSizeFallbackMode[source]#
Bases:
nncf.parameters.StrEnum
Specifies how to handle nodes that do not support the given group size.
- Parameters:
ERROR – Raise an error if the given group size is not supported by a node.
IGNORE – Skip nodes that cannot be compressed with the given group size.
ADJUST –
Automatically compute a suitable group size for unsupported nodes. When selected, each weight for which the channel size is not divisible by the general group size value will be compressed to a newly calculated group size. The new group size value is the maximal power of two (i.e., 2^k) such that:
channel size is divisible by it;
it is less than the originally specified group size value;
it is greater than or equal to min_adjusted_group_size.
If it’s not possible to find a value satisfying these requirements, such weight is compressed to the backup precision. If ratio < 1.0 and some weights have to be compressed to the backup precision because of group size issues, then these weights won’t contribute to the ratio of backup mode group.
- class nncf.quantization.advanced_parameters.QuantizationParameters[source]#
Contains quantization parameters for weights or activations.
- Parameters:
num_bits (Optional[int]) – The number of bits to use for quantization.
mode (nncf.common.quantization.structs.QuantizationMode) – The quantization mode to use, such as ‘symmetric’, ‘asymmetric’, etc.
signedness_to_force (Optional[bool]) – Whether to force the weights or activations to be signed (True), unsigned (False)
per_channel (Optional[bool]) – True if per-channel quantization is used, and False if per-tensor quantization is used.
narrow_range (Optional[bool]) –
Whether to use a narrow quantization range.
If False, then the input will be quantized into quantization range
[0; 2^num_bits - 1] for unsigned quantization and
[-2^(num_bits - 1); 2^(num_bits - 1) - 1] for signed quantization
If True, then the ranges would be:
[0; 2^num_bits - 2] for unsigned quantization and
[-2^(num_bits - 1) + 1; 2^(num_bits - 1) - 1] for signed quantization
- class nncf.quantization.advanced_parameters.FP8QuantizationParameters[source]#
Contains convert parameters for weights or activations.
- Parameters:
destination_type (FP8Type) – Currently contains E4M3 or E5M2 for FP8 precision.
- class nncf.quantization.advanced_parameters.AdvancedBiasCorrectionParameters[source]#
Contains advanced parameters for fine-tuning bias correction algorithm.
- Parameters:
apply_for_all_nodes (bool) – Whether to apply the correction to all nodes in the model, or only to nodes that have a bias.
threshold (Optional[float]) – The threshold value determines the maximum bias correction value. The bias correction are skipped If the value is higher than threshold.
- class nncf.quantization.advanced_parameters.AdvancedSmoothQuantParameters[source]#
Contains advanced alpha parameters for SmoothQuant algorithm. It regulates the calculation of the smooth scale for different node types. A negative value switches off the algorithm for current node type. In case of inaccurate results, this parameter may be adjusted in the range from 0 to 1 or set -1 to disable SmoothQuant algorithm.
- Parameters:
convolution (float) – Whether to apply smoothing for Convolution layers.
matmul (float) – Whether to apply smoothing for MatMul layers.
- class nncf.quantization.advanced_parameters.AdvancedQuantizationParameters[source]#
Contains advanced parameters for fine-tuning quantization algorithm.
- Parameters:
overflow_fix (Optional[nncf.quantization.advanced_parameters.OverflowFix]) – This option controls whether to apply the overflow issue fix for the 8-bit quantization.
quantize_outputs (bool) – Whether to insert additional quantizers right before each of the model outputs.
inplace_statistics (bool) – Defines whether to calculate quantizers statistics by backend graph operations or by default Python implementation, defaults to True.
disable_channel_alignment (bool) – Whether to disable the channel alignment.
disable_bias_correction (bool) – Whether to disable the bias correction.
batchwise_statistics (Optional[bool]) – Determines whether quantizer statistics should be calculated for each item of the batch or for the entire batch, default is None. “None” means that if torch.DataLoader or tensorflow.Dataset was passed as a data source for the calibration dataset, then in case batch_size of the data source > 1 batchwise_statistics sets to True, otherwise sets to False.
quantizer_propagation_rule (QuantizerPropagationRule) – An instance of the QuantizerPropagationRule enum that specifies how quantizers should be propagated and merged across branching nodes in the model’s computational graph. The strategies are as follows: - DO_NOT_MERGE_BRANCHES: No merging of quantization parameters across branches. - MERGE_IF_ALL_BRANCHES_SAME : Merge only if all branch quantization configurations are identical. - MERGE_WITH_POTENTIAL_REQUANTIZATION: Merge common configurations and allow for requantization on branches with additional options. - MERGE_ALL_IN_ONE: Attempt to merge into a single global quantization configuration if possible given hardware constraints. MERGE_ALL_IN_ONE is a default value.
activations_quantization_params (nncf.quantization.advanced_parameters.QuantizationParameters) – Quantization parameters for activations.
weights_quantization_params (nncf.quantization.advanced_parameters.QuantizationParameters) – Quantization parameters for weights.
activations_range_estimator_params (nncf.quantization.range_estimator.RangeEstimatorParameters) – Range estimator parameters for activations.
weights_range_estimator_params (nncf.quantization.range_estimator.RangeEstimatorParameters) – Range estimator parameters for weights.
bias_correction_params (nncf.quantization.advanced_parameters.AdvancedBiasCorrectionParameters) – Advanced bias correction parameters.
smooth_quant_alphas (nncf.quantization.advanced_parameters.AdvancedSmoothQuantParameters) – SmoothQuant-related parameters mapping. It regulates the calculation of the smooth scale. The default value stored in AdvancedSmoothQuantParameters. A negative value for each field switches off type smoothing. In case of inaccurate results, fields may be adjusted in the range from 0 to 1 or set -1 to disable smoothing for type.
smooth_quant_alpha (float) – Deprecated SmoothQuant-related parameter.
backend_params (dict[str, Any]) – Backend-specific parameters.
- class nncf.quantization.advanced_parameters.AdvancedAWQParameters[source]#
Contains advanced parameters for AWQ algorithm.
- Parameters:
subset_size (int) – The number of samples for AWQ.
percent_to_apply (float) – The percent of outliers for correction.
alpha_min (float) – Minimum value of smoothness parameter for grid search.
alpha_max (float) – Maximal value of smoothness parameter for grid search.
steps (int) – The number of the steps in grid search.
prefer_data_aware_scaling (bool) – Determines whether to use activations to calculate scales if activations are presented.
- class nncf.quantization.advanced_parameters.AdvancedScaleEstimationParameters[source]#
Contains advanced parameters for scale estimation algorithm.
- Parameters:
subset_size (int) – The number of samples for scale estimation.
initial_steps (int) – The number of the steps for absmax scale rectification.
scale_steps (int) – The number of the steps for grid search scale rectification from 1.0 to 1.0 - 0.05 * scale_step.
weight_penalty (float) – coefficient for penalty between fp and compressed weights. If -1 then doesn’t apply.
- class nncf.quantization.advanced_parameters.AdvancedGPTQParameters[source]#
Contains advanced parameters for GPTQ algorithm.
- Parameters:
damp_percent (float) – The percent of the average Hessian diagonal to use for dampening, recommended value is 0.1.
block_size (int) – The size of the blocks used during quantization. Defaults to 128.
subset_size (int) – Number of data samples to calculate Hessian. Defaults to 128.
- class nncf.quantization.advanced_parameters.AdvancedLoraCorrectionParameters[source]#
Contains advanced parameters for lora correction algorithm.
- Parameters:
adapter_rank (int) – rank of lora adapters. Defaults to 16.
num_iterations (int) – number of correction iterations. Defaults to 3.
apply_regularization (bool) – Whether to add a regularization during the correction process. Defaults to True. Helpful for big rank values to avoid overfitting.
subset_size (int) – Number of data samples for lora correction algorithm. Defaults to 128.
use_int8_adapters (bool) – Whether to 8-bit quantize lora adapters, otherwise they kept in the original weights precision. Defaults to True.
- class nncf.quantization.advanced_parameters.AdvancedCompressionParameters[source]#
Contains advanced parameters for compression algorithms.
- Parameters:
statistics_path (str) – Directory path to dump statistics.
lora_adapter_rank (int) – Rank of lora adapters for FQ_LORA format. Defaults to 256.
group_size_fallback_mode (GroupSizeFallbackMode) – Specifies how to handle nodes that do not support the given group size.
min_adjusted_group_size (int) – Minimum group size for adjustable group size searching. Defaults to 16. The reason behind this argument is to avoid too small group size values, which may lead to performance issues.
awq_params (AdvancedAWQParameters) – Advanced parameters for AWQ algorithm.
scale_estimation_params (AdvancedScaleEstimationParameters) – Advanced parameters for Scale Estimation algorithm.
gptq_params (AdvancedGPTQParameters) – Advanced parameters for GPTQ algorithm.
lora_correction_params (AdvancedLoraCorrectionParameters) – Advanced parameters for Lora Correction algorithm.
backend_params (dict[str, Any]) – Backend-specific parameters.
codebook (TTensor) – The codebook (LUT) for the weight compression. Applicable for vector quantization. Must be a numpy array or ov Tensor.
- class nncf.quantization.advanced_parameters.AdvancedAccuracyRestorerParameters[source]#
Contains advanced parameters for fine-tuning the accuracy restorer algorithm.
- Parameters:
max_num_iterations (int) – The maximum number of iterations of the algorithm. In other words, the maximum number of layers that may be reverted back to floating-point precision. By default, it is limited by the overall number of quantized layers.
tune_hyperparams (int) – Whether to tune of quantization parameters as a preliminary step before reverting layers back to the floating-point precision. It can bring an additional boost in performance and accuracy, at the cost of increased overall quantization time. The default value is False.
ranking_subset_size (Optional[int]) – Size of a subset that is used to rank layers by their contribution to the accuracy drop.
num_ranking_workers (Optional[int]) – The number of parallel workers that are used to rank quantization operations.
intermediate_model_dir (Optional[str]) – Path to the folder where the model, which was fully quantized with initial parameters, should be saved.
restore_mode (RestoreMode) – Specifies how to revert operations to their original precision.