NNCF configuration file schema

Type: object

The NNCF configuration file follows the JSON format and is the primary way to configure the result of NNCF application to a given user model. This configuration file is loaded into the NNCFConfig object by the user at runtime, after which the NNCFConfig is passed to the NNCF functions that perform actual compression or preparations for compression-aware training.

The NNCF JSON configuration file is usually set up on a per-model, per-compression use case basis to contain:
- a description of one or more compression algorithms to be applied to the model
- the configuration parameters for each of the chosen algorithms
- additional settings depending on the NNCF use case or integration scenario, e.g. specifying parameters for accuracy-aware training, or specifying model input shape for frameworks that do not have this data encapsulated in the model object in general such as PyTorch)
and other parameters, the list of which may extend with the ongoing development of NNCF.

This schema serves as a reference for users to write correct NNCF configuration files and each loaded NNCF configuration file into an NNCFConfig object is validated against it.


Describe the specifics of your model inputs here. This information is used to build the internal graph representation that is leveraged for proper compression functioning, and for exporting the compressed model to an executable format.
If this field is unspecified, NNCF will try to deduce the input shapes and tensor types for the graph building purposes based on dataloader objects that are passed to compression algorithms by the user.

single_object_version

Type: object
No Additional Properties

Type: array of number

Shape of the tensor expected as input to the model.

No Additional Items

Each item of this array must be:


Example:

[
    1,
    3,
    224,
    224
]

Type: string

Data type of the model input tensor.

Type: string

Determines what the tensor will be filled with when passed to the model during tracing and exporting.

Type: string

Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.

array_of_objects_version

Type: array of object
No Additional Items

Each item of this array must be:

Type: object
No Additional Properties

Type: array of number

Shape of the tensor expected as input to the model.

No Additional Items

Each item of this array must be:


Example:

[
    1,
    3,
    224,
    224
]

Type: string

Data type of the model input tensor.

Type: string

Determines what the tensor will be filled with when passed to the model during tracing and exporting.

Type: string

Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.

Type: enum (of string) Default: "ANY"

The target device, the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device. The default 'ANY' means compatible quantization supported by any HW. Set this value to 'TRIAL' if you are going to use a custom quantization schema.

Must be one of:

  • "ANY"
  • "CPU"
  • "GPU"
  • "NPU"
  • "TRIAL"
  • "CPU_SPR"


single_object_version


Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "quantization"

Type: object

Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.

No Additional Properties

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.

No Additional Properties


Type of precision initialization.

hawq

Type: const

Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md

Specific value: "hawq"

autoq

Type: const

Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md

Specific value: "autoq"

manual

Type: const

Allows to manually specify via following config options the exact bitwidth for each quantizer location.

Specific value: "manual"

Type: array of number Default: [2, 4, 8]

A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight and activation sections.

No Additional Items

Each item of this array must be:


Example:

[
    4,
    8
]

Type: number Default: 100

Number of data points to iteratively estimate Hessian trace.

Type: number Default: 200

Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.

Type: number Default: 0.0001

Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.

Type: number

For the hawq mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio equal to 0.25,and a uniform INT4-quantized model would have compression_ratio equal to 0.125.

Type: number Default: 1.0

The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.

Type: number Default: 20

The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.

Type: array of array

Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.

No Additional Items

Each item of this array must be:


Example:

[
    [
        2,
        "ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
    ],
    [
        8,
        "ResNet/ReLU[relu]/relu__0|OUTPUT"
    ]
]

Type: string

Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.

Type: boolean Default: false

Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.

Type: enum (of string) Default: "liberal"

The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.

Must be one of:

  • "strict"
  • "liberal"

Type: enum (of string) Default: "performance"

The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.

Must be one of:

  • "performance"
  • "mixed"

Type: boolean Default: true

Whether the model inputs should be immediately quantized prior to any other model operations.

Type: boolean Default: false

Whether the model outputs should be additionally quantized.

Type: object

Constraints to be applied to model weights quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: object

Constraints to be applied to model activations quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: array of array

Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.

No Additional Items

Each item of this array must be:

Type: object

This option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.

No Additional Properties
Example:

{
    "weights": {
        "QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
            "mode": "asymmetric"
        },
        "activations": {
            "{re}.*conv_first.*": {
                "mode": "asymmetric"
            },
            "{re}.*conv_second.*": {
                "mode": "symmetric"
            }
        }
    }
}

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: boolean Default: false

[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).

Type: enum (of string) Default: "enable"

This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable, the fix will not be applied. If set to enable or first_layer_only, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.

Must be one of:

  • "enable"
  • "disable"
  • "first_layer_only"

Type: object

Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.

No Additional Properties

Type: number

Gradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.

Type: number Default: 1

A zero-based index of the epoch, upon reaching which the activations will start to be quantized.

Type: number Default: 1

Epoch index upon which the weights will start to be quantized.

Type: number

Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.

Type: number Default: 30

Duration, in epochs, of the learning rate dropping process.

Type: number

Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.

Type: number Default: 0.001

Initial value of learning rate.

Type: number Default: 1e-05

Initial value of weight decay.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

Type: object

Applies filter pruning during training of the model to effectively remove entire sub-dimensions of tensors in the original model from computation and therefore increase performance.
See Pruning.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "filter_pruning"

Type: object
No Additional Properties

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.

Type: number Default: 0.0

Initial value of the pruning level applied to the prunable operations.

Type: object
No Additional Properties

Type: enum (of string) Default: "L2"

The type of filter importance metric.

Must be one of:

  • "L2"
  • "L1"
  • "geometric_median"

Type: number Default: 0.5

Target value of the pruning level for the operations that can be pruned. The operations are determined by analysis of the model architecture during the pruning algorithm initialization stage.

Type: number Default: 100

Number of epochs during which the pruning level is increased from pruning_init to pruning_target.

Type: number

Target value of the pruning level for model FLOPs.

Type: enum (of string) Default: "exponential"

The type of scheduling to use for adjusting the target pruning level.

Must be one of:

  • "exponential"
  • "exponential_with_bias"
  • "baseline"

Type: number Default: 0

Number of epochs for model pretraining before starting filter pruning.

Type: enum (of string) Default: "unweighted_ranking"

The type of filter ranking across the layers.

Must be one of:

  • "unweighted_ranking"
  • "learned_ranking"

Type: boolean Default: false

Whether to prune layers independently (choose filters with the smallest importance in each layer separately) or not.

Type: boolean Default: false

Whether to prune first convolutional layers or not. A 'first' convolutional layer is such a layer that the path from model input to this layer has no other convolution operations on it.

Type: boolean Default: false

Whether to prune downsampling convolutional layers (with stride > 1) or not.

Type: boolean Default: true

Whether to prune parameters of the Batch Norm layer that corresponds to pruned filters of the convolutional layer which feeds its output to this Batch Norm.

Type: object

Describes parameters specific to the LeGR pruning algorithm.See Pruning.md for more details.

No Additional Properties

Type: number Default: 400

Number of generations for the evolution algorithm.

Type: number Default: 200

Number of training steps to estimate pruned model accuracy.

Type: number Default: 0.8

Pruning level for the model to train LeGR algorithm on it. If learned ranking will be used for multiple pruning levels, the highest should be used as max_pruning. If model will be pruned with one pruning level, this target should be used.

Type: number Default: 42

Random seed for LeGR coefficients generation.

Type: number Default: 64

Size of population for the evolution algorithm.

Type: number Default: 16

Number of samples for the evolution algorithm.

Type: number Default: 0.1

Percent of mutate for the evolution algorithm.

Type: number Default: 1

Scale sigma for the evolution algorithm.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: object

Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its magnitude. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "magnitude_sparsity"

Type: number Default: 0.0

Initial value of the sparsity level applied to the model.

Type: object
No Additional Properties

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.

Type: object
No Additional Properties

Type: string Default: "global"

The mode of sparsity level setting.
global - the sparsity level is calculated across all weight values in the network across layers, local - the sparsity level can be set per-layer and within each layer is computed with respect only to the weight values within that layer.

Type: enum (of string)

The type of scheduling to use for adjusting the targetsparsity level. Default - exponential for rb_sparsity, polynomial otherwise

Must be one of:

  • "polynomial"
  • "exponential"
  • "adaptive"
  • "multistep"

Type: number Default: 0.5

Target sparsity level for the model, to be reached at the end of the compression schedule.

Type: number Default: 90

Index of the epoch upon which the sparsity level of the model is scheduled to become equal to sparsity_target.

Type: number Default: 100

Index of the epoch upon which the sparsity mask will be frozen and no longer trained.

Type: boolean Default: false

Whether the function-based sparsity level schedulers should update the sparsity level after each optimizer step instead of each epoch step.

Type: number

Number of optimizer steps in one epoch. Required to start proper scheduling in the first training epoch if update_per_optimizer_step is true.

Type: array of number Default: [90]

A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only).

No Additional Items

Each item of this array must be:

Type: array of number Default: [0.1, 0.5]

Multistep scheduler only - Levels of sparsity to use at each step of the scheduler as specified in the multistep_steps attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the multistep_steps by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.

No Additional Items

Each item of this array must be:

Type: number Default: 1

A conventional patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps.

Type: number Default: 0.9

For polynomial scheduler - determines the corresponding power value.

Type: boolean Default: true

For polynomial scheduler - if true, then the target sparsity level will be approached in concave manner, and in convex manner otherwise.

Type: enum (of string) Default: "normed_abs"

Determines the way in which the weight values will be sorted after being aggregated in order to determine the sparsity threshold corresponding to a specific sparsity level.

Must be one of:

  • "abs"
  • "normed_abs"

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: object

Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its importance as determined by the regularization-based sparsity algorithm. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "rb_sparsity"

Type: number Default: 0.0

Initial value of the sparsity level applied to the model

Type: object
No Additional Properties

Type: string Default: "global"

The mode of sparsity level setting.
global - the sparsity level is calculated across all weight values in the network across layers, local - the sparsity level can be set per-layer and within each layer is computed with respect only to the weight values within that layer.

Type: enum (of string)

The type of scheduling to use for adjusting the targetsparsity level. Default - exponential for rb_sparsity, polynomial otherwise

Must be one of:

  • "polynomial"
  • "exponential"
  • "adaptive"
  • "multistep"

Type: number Default: 0.5

Target sparsity level for the model, to be reached at the end of the compression schedule.

Type: number Default: 90

Index of the epoch upon which the sparsity level of the model is scheduled to become equal to sparsity_target.

Type: number Default: 100

Index of the epoch upon which the sparsity mask will be frozen and no longer trained.

Type: boolean Default: false

Whether the function-based sparsity level schedulers should update the sparsity level after each optimizer step instead of each epoch step.

Type: number

Number of optimizer steps in one epoch. Required to start proper scheduling in the first training epoch if update_per_optimizer_step is true.

Type: array of number Default: [90]

A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only).

No Additional Items

Each item of this array must be:

Type: array of number Default: [0.1, 0.5]

Multistep scheduler only - Levels of sparsity to use at each step of the scheduler as specified in the multistep_steps attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the multistep_steps by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.

No Additional Items

Each item of this array must be:

Type: number Default: 1

A conventional patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps.

Type: number Default: 0.9

For polynomial scheduler - determines the corresponding power value.

Type: boolean Default: true

For polynomial scheduler - if true, then the target sparsity level will be approached in concave manner, and in convex manner otherwise.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

Type: object

This algorithm is only useful in combination with other compression algorithms and improves theend accuracy result of the corresponding algorithm by calculating knowledge distillation loss between the compressed model currently in training and its original, uncompressed counterpart. See KnowledgeDistillation.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "knowledge_distillation"

Type: enum (of string)

Type of Knowledge Distillation Loss.

Must be one of:

  • "mse"
  • "softmax"

Type: number Default: 1.0

Knowledge Distillation loss value multiplier

Type: number Default: 1.0

softmax type only - Temperature for logits softening.

Type: object

This algorithm takes no additional parameters and is used when you want to load a checkpoint trained with another sparsity algorithm and do other compression without changing the sparsity mask.

No Additional Properties

Type: const
Specific value: "const_sparsity"

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

No Additional Properties

Type: const
Specific value: "experimental_quantization"

Type: object

Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.

No Additional Properties

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.

No Additional Properties


Type of precision initialization.

hawq

Type: const

Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md

Specific value: "hawq"

autoq

Type: const

Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md

Specific value: "autoq"

manual

Type: const

Allows to manually specify via following config options the exact bitwidth for each quantizer location.

Specific value: "manual"

Type: array of number Default: [2, 4, 8]

A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight and activation sections.

No Additional Items

Each item of this array must be:


Example:

[
    4,
    8
]

Type: number Default: 100

Number of data points to iteratively estimate Hessian trace.

Type: number Default: 200

Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.

Type: number Default: 0.0001

Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.

Type: number

For the hawq mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio equal to 0.25,and a uniform INT4-quantized model would have compression_ratio equal to 0.125.

Type: number Default: 1.0

The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.

Type: number Default: 20

The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.

Type: array of array

Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.

No Additional Items

Each item of this array must be:


Example:

[
    [
        2,
        "ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
    ],
    [
        8,
        "ResNet/ReLU[relu]/relu__0|OUTPUT"
    ]
]

Type: string

Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.

Type: boolean Default: false

Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.

Type: enum (of string) Default: "liberal"

The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.

Must be one of:

  • "strict"
  • "liberal"

Type: enum (of string) Default: "performance"

The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.

Must be one of:

  • "performance"
  • "mixed"

Type: boolean Default: true

Whether the model inputs should be immediately quantized prior to any other model operations.

Type: boolean Default: false

Whether the model outputs should be additionally quantized.

Type: object

Constraints to be applied to model weights quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: object

Constraints to be applied to model activations quantization only.

No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: boolean Default: false

Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.

Type: array of array

Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.

No Additional Items

Each item of this array must be:

Type: object

This option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.

No Additional Properties
Example:

{
    "weights": {
        "QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
            "mode": "asymmetric"
        },
        "activations": {
            "{re}.*conv_first.*": {
                "mode": "asymmetric"
            },
            "{re}.*conv_second.*": {
                "mode": "symmetric"
            }
        }
    }
}

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: object

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object
No Additional Properties

Type: enum (of string)

Mode of quantization. See Quantization.md for more details.

Must be one of:

  • "symmetric"
  • "asymmetric"

Type: number Default: 8

Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits parameter from the precision initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.

Type: boolean

Whether to use signed or unsigned input/output values for quantization. true will force the quantization to support signed values, false will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false, but the input values have differing signs during initialization, signed quantization will be performed instead.

Type: boolean Default: false

Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).

Type: object
No Additional Properties


This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.

global_range_init_configuration

Type: object
No Additional Properties

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

per_layer_range_init_configuration

Type: array of object
No Additional Items

Each item of this array must be:

Type: object

Type: number Default: 256

Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.

Default: "mixed_min_max"

Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.

mixed_min_max

Type: const

Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.

Specific value: "mixed_min_max"

min_max

Type: const

Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.

Specific value: "min_max"

mean_min_max

Type: const

Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.

Specific value: "mean_min_max"

threesigma

Type: const

Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.

Specific value: "threesigma"

percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.

Specific value: "percentile"

mean_percentile

Type: const

Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.

Specific value: "mean_percentile"

Type: object

Type-specific parameters of the initializer.

Type: number Default: 0.1

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.

Type: number Default: 99.9

For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: enum (of string)

The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.

Must be one of:

  • "activations"
  • "weights"

Type: boolean Default: false

[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).

Type: enum (of string) Default: "enable"

This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable, the fix will not be applied. If set to enable or first_layer_only, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.

Must be one of:

  • "enable"
  • "disable"
  • "first_layer_only"

Type: object

Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.

No Additional Properties

Type: number

Gradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.

Type: number Default: 1

A zero-based index of the epoch, upon reaching which the activations will start to be quantized.

Type: number Default: 1

Epoch index upon which the weights will start to be quantized.

Type: number

Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.

Type: number Default: 30

Duration, in epochs, of the learning rate dropping process.

Type: number

Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.

Type: number Default: 0.001

Initial value of learning rate.

Type: number Default: 1e-05

Initial value of weight decay.

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

Type: object
No Additional Properties

Type: object
No Additional Properties

Type: enum (of string)

Defines training strategy for tuning supernet. By default, progressive shrinking

Must be one of:

  • "progressive_shrinking"

Type: array of string

Defines the order of adding a new elasticity dimension from stage to stage

No Additional Items

Each item of this array must be:


Example:

[
    "width",
    "depth",
    "kernel"
]

Type: object

This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.

No Additional Properties

Type: number Default: 2000

Number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.

Type: object
No Additional Properties

Type: array of object

List of parameters per each supernet training stage

No Additional Items

Each item of this array must be:

Type: object

Defines a supernet training stage: how many epochs it takes, which elasticities with which settings are enabled, whether some operation should happen in the beginning

No Additional Properties

Type: array of string

Elasticity dimensions that are enabled for subnet sampling,the rest elastic dimensions are disabled

No Additional Items

Each item of this array must be:

Type: number

Restricts the maximum number of blocks in each independent group that can be skipped. For example, Resnet50 has 4 four independent groups, each group consists of a specific number of Bottleneck layers [3,4,6,3], that potentially can be skipped. If depth indicator equals to 1, only the last Bottleneck can be skipped in each group, if it equals 2 - the last two and etc. This allows to implement progressive shrinking logic from Once for all paper. Default value is 1.

Type: number

Restricts the maximum number of width values in each elastic layer. For example, some conv2d with elastic width can vary number of output channels from the following list: [8, 16, 32] If width indicator is equal to 1, it can only activate the maximum number of channels - 32. If it equals 2, then the last two can be selected - 16 or 32, or both of them.

Type: boolean

if True, triggers reorganization of weights in order to have filters sorted by importance (e.g. by l2 norm) in the beginning of the stage

Type: boolean

if True, triggers batchnorm adaptation in the beginning of the stage

Type: number

Initial learning rate for a stage. If specified in the stage descriptor, it will trigger a reset of the learning rate at the beginning of the stage.

Type: number

Number of epochs to compute the adjustment of the learning rate.

Type: number

Number of iterations to activate the random subnet. Default value is 1.

Type: object
No Additional Properties

Type: object
No Additional Properties

Type: array of array

List of building blocks to be skipped. The block is defined by names of start and end nodes. The end node is skipped. In contrast, the start node is executed. It produces a tensor that is bypassed through the skipping nodes until the one after end node.

No Additional Items

Each item of this array must be:


Example:

[
    [
        "start_op_1",
        "end_op_1"
    ],
    [
        "start_op_2",
        "end_op_2"
    ]
]

Type: number

Defines minimal number of operations in the skipping block. Option is available for the auto mode only. Default value is 5

Type: number

Defines maximal number of operations in the block. Option is available for the auto mode only. Default value is 50

Type: boolean

If True, automatic block search will not relate operations, which are fused on inference, into different blocks for skipping. True, by default

Type: object
No Additional Properties

Type: number

Minimal number of output channels that can be activated for each layers with elastic width. Default value is 32.

Type: number

Restricts total number of different elastic width values for each layer. The default value is -1 means that there's no restrictions.

Type: number

Defines a step size for a generation of the elastic width search space - the list of all possible width values for each layer. The generation starts from the number of output channels in the original model and stops when it reaches whether a min_width width value or number of generated width values equal to max_num_widths

Type: array of number

Defines elastic width search space via a list of multipliers. All possible width values are obtained by multiplying the original width value with the values in the given list.

No Additional Items

Each item of this array must be:

Type: string

The type of filter importance metric. Can be one of L1, L2, geometric_median, external. L2 by default.

Type: string

Path to the custom external weight importance (PyTorch tensor) per node that needs to weight reorder. Valid only when filterimportance is external. The file should be loaded via the torch interface torch.load(), represented as a dictionary. It maps NNCF node name to importance tensor with the same shape as the weights in the node module. For example, node Model/NNCFLinear[fc1]/linear_0 has a 3x1 linear module with weight [0.2, 0.3, 0.9], and in the dict{'Model/NNCFLinear[fc1]/linear0': tensor([0.4, 0.01, 0.2])} represents the corresponding weight importance.

Type: object
No Additional Properties

Type: number

Restricts the total number of different elastic kernel values for each layer. The default value is -1 means that there's no restrictions.

Type: array of string

Defines the available elasticity dimension for sampling subnets. By default, all elastic dimensions are available - [width, depth, kernel]

No Additional Items

Each item of this array must be:

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:

Type: object
No Additional Properties

Type: object
No Additional Properties

Type: number

Defines a global learning rate scheduler.If these parameters are not set, a stage learning rate scheduler will be used.

Type: number

Defines the number of samples used for each training epoch.

Type: object
No Additional Properties

Type: const
Specific value: "movement_sparsity"

Type: object
No Additional Properties

Type: number

Index of the starting epoch (include) for warmup stage.

Type: number

Index of the end epoch (exclude) for warmup stage.

Type: number

The regularization factor on weight importance scores. With a larger positive value, more model weights will be regarded as less important and thus be sparsified.

Type: boolean Default: true

Whether to do structured mask resolution after warmup stage. Only supports structured masking on multi-head self-attention blocks and feed-forward networks now.

Type: number Default: 3.0

The power value of polynomial decay for threshold and regularization factor update during warmup stage.

Type: number

The initial value of importance threshold during warmup stage. If not specified, this will be automatically decided during training so that the model is with about 0.1% linear layer sparsity on involved layers at the beginning of warmup stage.

Type: number Default: 0.0

The final value of importance threshold during warmup stage.

Type: number

Number of training steps in one epoch, used for proper threshold and regularization factor updates. Optional if warmupstartepoch >=1 since this can be counted in the 1st epoch. Otherwise users have to specify it.

Type: array of object

Describes how each supported layer will be sparsified.

No Additional Items

Each item of this array must be:

Type: object
No Additional Properties

Type: enum (of string)

Defines in which mode a supported layer will be sparsified.

Must be one of:

  • "fine"
  • "block"
  • "per_dim"

Type: array of number

The block shape for weights to sparsify. Required when mode="block".

No Additional Items

Each item of this array must be:

Type: number

The dimension for weights to sparsify. Required when mode="per_dim".

Type: array of string or string

Model control flow graph node scopes to be considered in this mode.

No Additional Items

Each item of this array must be:

Type: array of string or string

A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

"{re}conv.*"
[
    "LeNet/relu_0",
    "LeNet/relu_1"
]

Type: array of string or string

A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.

No Additional Items

Each item of this array must be:


Examples:

[
    "UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
    "UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"

Type: boolean Default: true

If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

array_of_objects_version

Type: array
No Additional Items

Each item of this array must be:


Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i0
Type: object

Applies filter pruning during training of the model to effectively remove entire sub-dimensions of tensors in the original model from computation and therefore increase performance.
See Pruning.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i1
Type: object

Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its magnitude. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i2
Type: object

Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its importance as determined by the regularization-based sparsity algorithm. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i3
Type: object

This algorithm is only useful in combination with other compression algorithms and improves theend accuracy result of the corresponding algorithm by calculating knowledge distillation loss between the compressed model currently in training and its original, uncompressed counterpart. See KnowledgeDistillation.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i4
Type: object

This algorithm takes no additional parameters and is used when you want to load a checkpoint trained with another sparsity algorithm and do other compression without changing the sparsity mask.

Same definition as compression_oneOf_i0_oneOf_i5
Type: object

Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.

Same definition as compression_oneOf_i0_oneOf_i6


Options for the execution of the NNCF-powered 'Accuracy Aware' training pipeline. The 'mode' property determines the mode of the accuracy-aware training execution and further available parameters.

early_exit

Type: object

Early exit mode schema. See EarlyExitTraining.md for more general info on this mode.

No Additional Properties

Type: const
Specific value: "early_exit"


No Additional Properties

Type: object

The following properties are required:

  • maximal_relative_accuracy_degradation
Type: object

The following properties are required:

  • maximal_absolute_accuracy_degradation

Type: number

Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.

Type: number

Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.

Type: number Default: 10000

The maximal total fine-tuning epoch count. If the accuracy criteria wouldn't reach during fine-tuning, the most accurate model will be returned.

adaptive_compression_level

Type: object

Adaptive compression level training mode schema. See AdaptiveCompressionLevelTraining.md for more general info on this mode.

No Additional Properties

Type: const
Specific value: "adaptive_compression_level"


No Additional Properties

Type: object

The following properties are required:

  • maximal_relative_accuracy_degradation
Type: object

The following properties are required:

  • maximal_absolute_accuracy_degradation

Type: number

Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.

Type: number

Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.

Type: number Default: 5

Number of epochs to fine-tune during the initial training phase of the adaptive compression training loop.

Type: number Default: 0.1

Initial value for the compression rate increase/decrease training phase of the compression training loop.

Type: number Default: 0.5

Factor used to reduce the compression rate change step in the adaptive compression training loop.

Type: number Default: 0.5

Factor used to reduce the learning rate after compression rate step is reduced

Type: number Default: 0.025

The minimal compression rate change step value after which the training loop is terminated.

Type: number Default: 3

The number of epochs to fine-tune the model for a given compression rate after the initial training phase of the training loop.

Type: number Default: 10000

The maximal total fine-tuning epoch count. If the epoch counter reaches this number, the fine-tuning process will stop and the model with the largest compression rate will be returned.

Type: number

PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.

Type: boolean

[Deprecated] Whether to enable strict input tensor shape matching when building the internal graph representation of the model. Set this to false if your model inputs have any variable dimension other than the 0-th (batch) dimension, or if any non-batch dimension of the intermediate tensors in your model execution flow depends on the input dimension, otherwise the compression will most likely fail.

Type: string

Log directory for NNCF-specific logging outputs.