NNCF configuration file schema

Type: object

The NNCF configuration file follows the JSON format and is the primary way to configure the result of NNCF application to a given user model. This configuration file is loaded into the NNCFConfig object by the user at runtime, after which the NNCFConfig is passed to the NNCF functions that perform actual compression or preparations for compression-aware training.

The NNCF JSON configuration file is usually set up on a per-model, per-compression use case basis to contain:
- a description of one or more compression algorithms to be applied to the model
- the configuration parameters for each of the chosen algorithms
- additional settings depending on the NNCF use case or integration scenario, e.g. specifying parameters for accuracy-aware training, or specifying model input shape for frameworks that do not have this data encapsulated in the model object in general such as PyTorch)
and other parameters, the list of which may extend with the ongoing development of NNCF.

This schema serves as a reference for users to write correct NNCF configuration files and each loaded NNCF configuration file into an NNCFConfig object is validated against it.


Describe the specifics of your model inputs here. This information is used to build the internal graph representation that is leveraged for proper compression functioning, and for exporting the compressed model to an executable format.
If this field is unspecified, NNCF will try to deduce the input shapes and tensor types for the graph building purposes based on dataloader objects that are passed to compression algorithms by the user.

single_object_version

Type: object
No Additional Properties

Type: array of number

Shape of the tensor expected as input to the model.

No Additional Items

Each item of this array must be:


Example:

[
    1,
    3,
    224,
    224
]

Type: string

Data type of the model input tensor.

Type: string

Determines what the tensor will be filled with when passed to the model during tracing and exporting.

Type: string

Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.

array_of_objects_version

Type: array of object
No Additional Items

Each item of this array must be:

Type: object
No Additional Properties

Type: array of number

Shape of the tensor expected as input to the model.

No Additional Items

Each item of this array must be:


Example:

[
    1,
    3,
    224,
    224
]

Type: string

Data type of the model input tensor.

Type: string

Determines what the tensor will be filled with when passed to the model during tracing and exporting.

Type: string

Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.

Type: enum (of string) Default: "ANY"

The target device, the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device. The default 'ANY' means compatible quantization supported by any HW. Set this value to 'TRIAL' if you are going to use a custom quantization schema.

Must be one of:

  • "ANY"
  • "CPU"
  • "GPU"
  • "NPU"
  • "TRIAL"
  • "CPU_SPR"


single_object_version


Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "quantization"

Type: object

Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.

No Additional Properties

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.

No Additional Properties


Type of precision initialization.

hawq

Type: const

Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md

Specific value: "hawq"

autoq

Type: const

Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md

Specific value: "autoq"

manual

Type: const

Allows to manually specify via following config options the exact bitwidth for each quantizer location.

Specific value: "manual"

Type: array of number Default: [2, 4, 8]

A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight and activation sections.

No Additional Items

Each item of this array must be:


Example:

[
    4,
    8
]

Type: number Default: 100

Number of data points to iteratively estimate Hessian trace.

Type: number Default: 200

Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.

Type: number Default: 0.0001

Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.

Type: number

For the hawq mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio equal to 0.25,and a uniform INT4-quantized model would have compression_ratio equal to 0.125.

Type: number Default: 1.0

The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.

Type: number Default: 20

The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.

Type: array of array

Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.

No Additional Items

Each item of this array must be:


Example:

[
    [
        2,
        "ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
    ],
    [
        8,
        "ResNet/ReLU[relu]/relu__0|OUTPUT"
    ]
]

Type: string

Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.

Type: boolean Default: false

Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.

Type: enum (of string) Default: "liberal"

The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.

Must be one of:

  • "strict"
  • "liberal"

Type: enum (of string) Default: "performance"

The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.

Must be one of:

  • "performance"
  • "mixed"

Type: boolean Default: true

Whether the model inputs should be immediately quantized prior to any other model operations.

Type: boolean Default: false

Whether the model outputs should be additionally quantized.

Type: object

Constraints to be applied to model weights quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: object

Constraints to be applied to model activations quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: array of array

Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.

No Additional Items

Each item of this array must be:

Type: object

This option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.

No Additional Properties
Example:

{
    "weights": {
        "QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
            "mode": "asymmetric"
        },
        "activations": {
            "{re}.*conv_first.*": {
                "mode": "asymmetric"
            },
            "{re}.*conv_second.*": {
                "mode": "symmetric"
            }
        }
    }
}

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: boolean Default: false

[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).

Type: enum (of string) Default: "enable"

This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable, the fix will not be applied. If set to enable or first_layer_only, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.

Must be one of:

  • "enable"
  • "disable"
  • "first_layer_only"

Type: object

Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.

No Additional Properties

Type: number

Gradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.

Type: number Default: 1

A zero-based index of the epoch, upon reaching which the activations will start to be quantized.

Type: number Default: 1

Epoch index upon which the weights will start to be quantized.

Type: number

Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.

Type: number Default: 30

Duration, in epochs, of the learning rate dropping process.

Type: number

Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.

Type: number Default: 0.001

Initial value of learning rate.

Type: number Default: 1e-05

Initial value of weight decay.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "experimental_quantization"

Type: object

Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.

No Additional Properties

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.

No Additional Properties


Type of precision initialization.

hawq

Type: const

Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md

Specific value: "hawq"

autoq

Type: const

Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md

Specific value: "autoq"

manual

Type: const

Allows to manually specify via following config options the exact bitwidth for each quantizer location.

Specific value: "manual"

Type: array of number Default: [2, 4, 8]

A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight and activation sections.

No Additional Items

Each item of this array must be:


Example:

[
    4,
    8
]

Type: number Default: 100

Number of data points to iteratively estimate Hessian trace.

Type: number Default: 200

Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.

Type: number Default: 0.0001

Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.

Type: number

For the hawq mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio equal to 0.25,and a uniform INT4-quantized model would have compression_ratio equal to 0.125.

Type: number Default: 1.0

The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.

Type: number Default: 20

The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.

Type: array of array

Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.

No Additional Items

Each item of this array must be:


Example:

[
    [
        2,
        "ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
    ],
    [
        8,
        "ResNet/ReLU[relu]/relu__0|OUTPUT"
    ]
]

Type: string

Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.

Type: boolean Default: false

Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.

Type: enum (of string) Default: "liberal"

The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.

Must be one of:

  • "strict"
  • "liberal"

Type: enum (of string) Default: "performance"

The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.

Must be one of:

  • "performance"
  • "mixed"

Type: boolean Default: true

Whether the model inputs should be immediately quantized prior to any other model operations.

Type: boolean Default: false

Whether the model outputs should be additionally quantized.

Type: object

Constraints to be applied to model weights quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: object

Constraints to be applied to model activations quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: array of array

Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.

No Additional Items

Each item of this array must be:

Type: object

This option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.

No Additional Properties
Example:

{
    "weights": {
        "QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
            "mode": "asymmetric"
        },
        "activations": {
            "{re}.*conv_first.*": {
                "mode": "asymmetric"
            },
            "{re}.*conv_second.*": {
                "mode": "symmetric"
            }
        }
    }
}

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: boolean Default: false

[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).

Type: enum (of string) Default: "enable"

This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable, the fix will not be applied. If set to enable or first_layer_only, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.

Must be one of:

  • "enable"
  • "disable"
  • "first_layer_only"

Type: object

Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.

No Additional Properties

Type: number

Gradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.

Type: number Default: 1

A zero-based index of the epoch, upon reaching which the activations will start to be quantized.

Type: number Default: 1

Epoch index upon which the weights will start to be quantized.

Type: number

Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.

Type: number Default: 30

Duration, in epochs, of the learning rate dropping process.

Type: number

Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.

Type: number Default: 0.001

Initial value of learning rate.

Type: number Default: 1e-05

Initial value of weight decay.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

array_of_objects_version

Type: array
No Additional Items

Each item of this array must be:


Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i0
Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i1


Options for the execution of the NNCF-powered 'Accuracy Aware' training pipeline. The 'mode' property determines the mode of the accuracy-aware training execution and further available parameters.

early_exit

Type: object

Early exit mode schema. See EarlyExitTraining.md for more general info on this mode.

No Additional Properties

Type: const
Specific value: "early_exit"


No Additional Properties

Type: object

The following properties are required:

  • maximal_relative_accuracy_degradation
Type: object

The following properties are required:

  • maximal_absolute_accuracy_degradation

Type: number

Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.

Type: number

Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.

Type: number Default: 10000

The maximal total fine-tuning epoch count. If the accuracy criteria wouldn't reach during fine-tuning, the most accurate model will be returned.

adaptive_compression_level

Type: object

Adaptive compression level training mode schema. See AdaptiveCompressionLevelTraining.md for more general info on this mode.

No Additional Properties

Type: const
Specific value: "adaptive_compression_level"


No Additional Properties

Type: object

The following properties are required:

  • maximal_relative_accuracy_degradation
Type: object

The following properties are required:

  • maximal_absolute_accuracy_degradation

Type: number

Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.

Type: number

Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.

Type: number Default: 5

Number of epochs to fine-tune during the initial training phase of the adaptive compression training loop.

Type: number Default: 0.1

Initial value for the compression rate increase/decrease training phase of the compression training loop.

Type: number Default: 0.5

Factor used to reduce the compression rate change step in the adaptive compression training loop.

Type: number Default: 0.5

Factor used to reduce the learning rate after compression rate step is reduced

Type: number Default: 0.025

The minimal compression rate change step value after which the training loop is terminated.

Type: number Default: 3

The number of epochs to fine-tune the model for a given compression rate after the initial training phase of the training loop.

Type: number Default: 10000

The maximal total fine-tuning epoch count. If the epoch counter reaches this number, the fine-tuning process will stop and the model with the largest compression rate will be returned.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

Type: boolean

[Deprecated] Whether to enable strict input tensor shape matching when building the internal graph representation of the model. Set this to false if your model inputs have any variable dimension other than the 0-th (batch) dimension, or if any non-batch dimension of the intermediate tensors in your model execution flow depends on the input dimension, otherwise the compression will most likely fail.

Type: string

Log directory for NNCF-specific logging outputs.