The NNCF configuration file follows the JSON format and is the primary way to configure the result of NNCF application to a given user model. This configuration file is loaded into the NNCFConfig object by the user at runtime, after which the NNCFConfig is passed to the NNCF functions that perform actual compression or preparations for compression-aware training.
The NNCF JSON configuration file is usually set up on a per-model, per-compression use case basis to contain:
- a description of one or more compression algorithms to be applied to the model
- the configuration parameters for each of the chosen algorithms
- additional settings depending on the NNCF use case or integration scenario, e.g. specifying parameters for accuracy-aware training, or specifying model input shape for frameworks that do not have this data encapsulated in the model object in general such as PyTorch)
and other parameters, the list of which may extend with the ongoing development of NNCF.
This schema serves as a reference for users to write correct NNCF configuration files and each loaded NNCF configuration file into an NNCFConfig object is validated against it.
Describe the specifics of your model inputs here. This information is used to build the internal graph representation that is leveraged for proper compression functioning, and for exporting the compressed model to an executable format.
If this field is unspecified, NNCF will try to deduce the input shapes and tensor types for the graph building purposes based on dataloader objects that are passed to compression algorithms by the user.
Shape of the tensor expected as input to the model.
No Additional Items[
1,
3,
224,
224
]
Data type of the model input tensor.
Determines what the tensor will be filled with when passed to the model during tracing and exporting.
Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.
Shape of the tensor expected as input to the model.
No Additional Items[
1,
3,
224,
224
]
Data type of the model input tensor.
Determines what the tensor will be filled with when passed to the model during tracing and exporting.
Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.
The target device, the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device. The default 'ANY' means compatible quantization supported by any HW. Set this value to 'TRIAL' if you are going to use a custom quantization schema.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
"quantization"
Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.
No Additional PropertiesType of precision initialization.
Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md
Specific value:"hawq"
Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md
Specific value:"autoq"
Allows to manually specify via following config options the exact bitwidth for each quantizer location.
Specific value:"manual"
A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight and activation sections.
[
4,
8
]
Number of data points to iteratively estimate Hessian trace.
Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.
Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.
For the hawq mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio equal to 0.25,and a uniform INT4-quantized model would have compression_ratio equal to 0.125.
The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.
The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.
Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.
No Additional ItemsA tuple of a bitwidth and a scope of the quantizer to assign the bitwidth to.
No Additional Items[
[
2,
"ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
],
[
8,
"ResNet/ReLU[relu]/relu__0|OUTPUT"
]
]
Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.
Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.
The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.
The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.
Whether the model inputs should be immediately quantized prior to any other model operations.
Whether the model outputs should be additionally quantized.
Constraints to be applied to model weights quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Constraints to be applied to model activations quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.
No Additional ItemsThis option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.
No Additional Properties{
"weights": {
"QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
"mode": "asymmetric"
},
"activations": {
"{re}.*conv_first.*": {
"mode": "asymmetric"
},
"{re}.*conv_second.*": {
"mode": "symmetric"
}
}
}
}
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).
This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable, the fix will not be applied. If set to enable or first_layer_only, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.
Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.
No Additional PropertiesGradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.
A zero-based index of the epoch, upon reaching which the activations will start to be quantized.
Epoch index upon which the weights will start to be quantized.
Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.
Duration, in epochs, of the learning rate dropping process.
Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.
Initial value of learning rate.
Initial value of weight decay.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
"experimental_quantization"
Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.
No Additional PropertiesType of precision initialization.
Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md
Specific value:"hawq"
Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md
Specific value:"autoq"
Allows to manually specify via following config options the exact bitwidth for each quantizer location.
Specific value:"manual"
A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight and activation sections.
[
4,
8
]
Number of data points to iteratively estimate Hessian trace.
Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.
Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.
For the hawq mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio equal to 0.25,and a uniform INT4-quantized model would have compression_ratio equal to 0.125.
The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.
The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.
Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.
No Additional ItemsA tuple of a bitwidth and a scope of the quantizer to assign the bitwidth to.
No Additional Items[
[
2,
"ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
],
[
8,
"ResNet/ReLU[relu]/relu__0|OUTPUT"
]
]
Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.
Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.
The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.
The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.
Whether the model inputs should be immediately quantized prior to any other model operations.
Whether the model outputs should be additionally quantized.
Constraints to be applied to model weights quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Constraints to be applied to model activations quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.
No Additional ItemsThis option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.
No Additional Properties{
"weights": {
"QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
"mode": "asymmetric"
},
"activations": {
"{re}.*conv_first.*": {
"mode": "asymmetric"
},
"{re}.*conv_second.*": {
"mode": "symmetric"
}
}
}
}
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).
This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable, the fix will not be applied. If set to enable or first_layer_only, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.
Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.
No Additional PropertiesGradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.
A zero-based index of the epoch, upon reaching which the activations will start to be quantized.
Epoch index upon which the weights will start to be quantized.
Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.
Duration, in epochs, of the learning rate dropping process.
Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.
Initial value of learning rate.
Initial value of weight decay.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
Options for the execution of the NNCF-powered 'Accuracy Aware' training pipeline. The 'mode' property determines the mode of the accuracy-aware training execution and further available parameters.
Early exit mode schema. See EarlyExitTraining.md for more general info on this mode.
No Additional Properties"early_exit"
Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.
Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.
The maximal total fine-tuning epoch count. If the accuracy criteria wouldn't reach during fine-tuning, the most accurate model will be returned.
Adaptive compression level training mode schema. See AdaptiveCompressionLevelTraining.md for more general info on this mode.
No Additional Properties"adaptive_compression_level"
Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.
Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.
Number of epochs to fine-tune during the initial training phase of the adaptive compression training loop.
Initial value for the compression rate increase/decrease training phase of the compression training loop.
Factor used to reduce the compression rate change step in the adaptive compression training loop.
Factor used to reduce the learning rate after compression rate step is reduced
The minimal compression rate change step value after which the training loop is terminated.
The number of epochs to fine-tune the model for a given compression rate after the initial training phase of the training loop.
The maximal total fine-tuning epoch count. If the epoch counter reaches this number, the fine-tuning process will stop and the model with the largest compression rate will be returned.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
[Deprecated] Whether to enable strict input tensor shape matching when building the internal graph representation of the model. Set this to false if your model inputs have any variable dimension other than the 0-th (batch) dimension, or if any non-batch dimension of the intermediate tensors in your model execution flow depends on the input dimension, otherwise the compression will most likely fail.
Log directory for NNCF-specific logging outputs.