nncf.torch.quantization.algo#

Contains builder and controller class definitions for the quantization algorithm.

Classes#

QuantizationController

Controller for the quantization algorithm in PT.

class nncf.torch.quantization.algo.QuantizationController(target_model, config, debug_interface, weight_quantizers, non_weight_quantizers, groups_of_adjacent_quantizers, quantizers_input_shapes, build_time_metric_info=None, build_time_range_init_params=None)[source]#

Bases: nncf.torch.quantization.base_ctrl.QuantizationControllerBase

Controller for the quantization algorithm in PT.

Parameters:
  • target_model (nncf.torch.nncf_network.NNCFNetwork) –

  • config (nncf.config.NNCFConfig) –

  • debug_interface (nncf.torch.quantization.debug_interface.QuantizationDebugInterface) –

  • weight_quantizers (Dict[nncf.common.quantization.structs.WeightQuantizerId, nncf.torch.quantization.structs.WeightQuantizerInfo]) –

  • non_weight_quantizers (Dict[nncf.common.quantization.structs.NonWeightQuantizerId, nncf.torch.quantization.structs.NonWeightQuantizerInfo]) –

  • groups_of_adjacent_quantizers (nncf.torch.quantization.precision_init.adjacent_quantizers.GroupsOfAdjacentQuantizers) –

  • quantizers_input_shapes (Dict[nncf.common.quantization.structs.QuantizerId, Tuple[int]]) –

  • build_time_metric_info (nncf.torch.quantization.metrics.QuantizationShareBuildTimeInfo) –

  • build_time_range_init_params (nncf.torch.quantization.init_range.PTRangeInitParams) –

property scheduler: nncf.api.compression.CompressionScheduler[source]#

The compression scheduler for this particular algorithm combination.

Return type:

nncf.api.compression.CompressionScheduler

property loss: nncf.api.compression.CompressionLoss[source]#

The compression loss for this particular algorithm combination.

Return type:

nncf.api.compression.CompressionLoss

prepare_for_export()[source]#

Prepare the compressed model for exporting to a backend-specific model serialization format.

distributed()[source]#

Should be called when distributed training with multiple training processes is going to be used (i.e. after the model is wrapped with DistributedDataParallel). Any special preparations for the algorithm to properly support distributed training should be made inside this function.

compression_stage()[source]#

Returns the compression stage. Should be used on saving best checkpoints to distinguish between uncompressed, partially compressed, and fully compressed models.

Returns:

The compression stage of the target model.

Return type:

nncf.api.compression.CompressionStage

init_precision(precision_init_type, precision_init_params, precision_constraints)[source]#

Precision initialization happens based on an measure of layer sensitivity to perturbations. The measure is calculated by average Hessian trace estimation for each layer using Hutchinson algorithm.

Parameters:
  • precision_init_type (str) –

  • precision_init_params (nncf.torch.quantization.precision_init.base_init.BasePrecisionInitParams) –

  • precision_constraints (nncf.torch.quantization.precision_constraints.HardwareQuantizationConstraints) –

Return type:

nncf.common.quantization.quantizer_setup.SingleConfigQuantizerSetup

init_range(range_init_params=None)[source]#

Tracks input statistics for quantizers in the model and sets ranges of the quantizers to correspond to minimum and maximum input tensor levels observed. :param range_init_params: specifies parameters for this range initialization call; if None, the parameters that were used during compressed model creation will be used.

Parameters:

range_init_params (nncf.torch.quantization.init_range.PTRangeInitParams) –

statistics(quickly_collected_only=False)[source]#

Returns a Statistics class instance that contains compression algorithm statistics.

Parameters:

quickly_collected_only – Enables collection of the statistics that don’t take too much time to compute. Can be helpful for the case when need to keep track of statistics on each training batch/step/iteration.

Return type:

nncf.common.statistics.NNCFStatistics

strip_model(model, do_copy=False)[source]#

Strips auxiliary layers that were used for the model compression, as it’s only needed for training. The method is used before exporting the model in the target format.

Parameters:
  • model (nncf.torch.nncf_network.NNCFNetwork) – The compressed model.

  • do_copy (bool) – Modify copy of the model, defaults to False.

Returns:

The stripped model.

Return type:

nncf.torch.nncf_network.NNCFNetwork