otx.algorithms.segmentation.adapters.mmseg#

OTX Adapters - mmseg.

Classes

OTXSegDataset(**kwargs)

Wrapper dataset that allows using a OTX dataset to train models.

LiteHRNet(extra[, in_channels, conv_cfg, ...])

Lite-HRNet backbone.

MMOVBackbone(*args, **kwargs)

MMOVBackbone.

MMOVDecodeHead([model_path_or_model, ...])

MMOVDecodeHead.

DetConLoss([temperature, ...])

Modified from deepmind/detcon.

SelfSLMLP(in_channels, hid_channels, ...[, ...])

The SelfSLMLP neck: fc/conv-bn-relu-fc/conv.

ConstantScalarScheduler([scale])

The learning rate remains constant over time.

PolyScalarScheduler(start_scale, end_scale, ...)

The learning rate changes over time according to a polynomial schedule.

StepScalarScheduler(scales, num_iters[, ...])

Step learning rate scheduler.

DetConB(backbone[, neck, head, pretrained, ...])

DetCon Implementation.

CrossEntropyLossWithIgnore(*args, **kwargs)

CrossEntropyLossWithIgnore with Ignore Mode Support for Class Incremental Learning.

SupConDetConB(backbone[, decode_head, neck, ...])

Apply DetConB as a contrastive part of Supervised Contrastive Learning (https://arxiv.org/abs/2004.11362).

MeanTeacherSegmentor(orig_type[, ...])

Mean teacher segmentor for semi-supervised learning.

class otx.algorithms.segmentation.adapters.mmseg.ConstantScalarScheduler(scale: float = 30.0)[source]#

Bases: BaseScalarScheduler

The learning rate remains constant over time.

The learning rate equals the scale.

Parameters:

scale (float) – The learning rate scale.

class otx.algorithms.segmentation.adapters.mmseg.CrossEntropyLossWithIgnore(*args, **kwargs)[source]#

Bases: CrossEntropyLoss

CrossEntropyLossWithIgnore with Ignore Mode Support for Class Incremental Learning.

When new classes are added through continual training cycles, images from previous cycles may become partially annotated if they are not revisited. To prevent the model from predicting these new classes for such images, CrossEntropyLossWithIgnore can be used to ignore the unseen classes.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(cls_score: Tensor | None, label: Tensor | None, weight: Tensor | None = None, avg_factor: int | None = None, reduction_override: str | None = 'mean', ignore_index: int = 255, valid_label_mask: Tensor | None = None, **kwargs)[source]#

Forward.

Parameters:
  • cls_score (torch.Tensor, optional) – The prediction with shape (N, 1).

  • label (torch.Tensor, optional) – The learning label of the prediction.

  • weight (torch.Tensor, optional) – Sample-wise loss weight. Default: None.

  • class_weight (list[float], optional) – The weight for each class. Default: None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Default: None.

  • reduction_override (str, optional) – The method used to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Default: ‘mean’.

  • ignore_index (int) – Specifies a target value that is ignored and does not contribute to the input gradients. When avg_non_ignore `` is ``True, and the reduction is ''mean'', the loss is averaged over non-ignored targets. Defaults: 255.

  • valid_label_mask (torch.Tensor, optional) – The valid labels with shape (N, num_classes). If the value in the valid_label_mask is 0, mask label of the the mask label of the class corresponding to its index will be ignored like ignore_index.

  • **kwargs (Any) – Additional keyword arguments.

property loss_name#

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name.

Returns:

The name of this loss item.

Return type:

str

class otx.algorithms.segmentation.adapters.mmseg.DetConB(backbone: Dict[str, Any], neck: Dict[str, Any] | None = None, head: Dict[str, Any] | None = None, pretrained: str | None = None, base_momentum: float = 0.996, num_classes: int = 256, num_samples: int = 16, downsample: int = 32, input_transform: str = 'resize_concat', in_index: List[int] | int = [0], align_corners: bool = False, **kwargs)[source]#

Bases: Module

DetCon Implementation.

Implementation of ‘Efficient Visual Pretraining with Contrastive Detection’

(https://arxiv.org/abs/2103.10957).

Parameters:
  • backbone (dict) – Config dict for module of backbone ConvNet.

  • neck (dict, optional) – Config dict for module of deep features to compact feature vectors. Default: None.

  • head (dict, optional) – Config dict for module of loss functions. Default: None.

  • pretrained (str, optional) – Path to pre-trained weights. Default: None.

  • base_momentum (float) – The base momentum coefficient for the target network. Default: 0.996.

  • num_classes (int) – The number of classes to be considered as pseudo classes. Default: 256.

  • num_samples (int) – The number of samples to be sampled. Default: 16.

  • downsample (int) – The ratio of the mask size to the feature size. Default: 32.

  • input_transform (str) – Input transform of features from backbone. Default: “resize_concat”.

  • in_index (list) – Feature index to be used for DetCon if the backbone outputs multi-scale features wrapped by list or tuple. Default: [0].

  • align_corners (bool) – Whether apply align_corners during resize. Default: False.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extract_feat(img: Tensor)[source]#

Extract features from images.

Parameters:

img (Tensor) – Input image.

Returns:

Features from the online_backbone.

Return type:

Tensor

forward(img, img_metas, return_loss=True, **kwargs)[source]#

Calls either forward_train() or forward_test() depending on whether return_loss is True.

Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

forward_train(img: Tensor, img_metas: List[Dict], gt_semantic_seg: Tensor, return_embedding: bool = False)[source]#

Forward function for training.

Parameters:
  • img (Tensor) – Input images.

  • img_metas (list[dict]) – Input information.

  • gt_semantic_seg (Tensor) – Pseudo masks. It is used to organize features among the same classes.

  • return_embedding (bool) – Whether returning embeddings from the online backbone. It can be used for SupCon. Default: False.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

init_weights(pretrained: str | None = None)[source]#

Initialize the weights of model.

Parameters:

pretrained (str, optional) – Path to pre-trained weights. Default: None.

sample_masked_feats(feats: Tensor | List | Tuple, masks: Tensor, projector: Module)[source]#

Sampled features from mask.

Parameters:
  • feats (list, tuple, Tensor) – Features from the backbone.

  • masks (Tensor) – Ground truth masks to be sampled and to be used to filter feats.

  • projector (nn.Module) – Projector MLP.

Returns:

(proj, sampled_mask_ids), features from the projector and ids used to sample masks.

Return type:

tuple[Tensor, Tensor]

set_step_params(init_iter, epoch_size)[source]#

set_step_params to be skipped.

static state_dict_hook(module, state_dict, *args, **kwargs)[source]#

Save only online backbone as output state_dict.

train_step(data_batch: Dict[str, Any], optimizer: Optimizer | Dict, **kwargs)[source]#

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

Parameters:
  • data_batch (dict) – The output of dataloader.

  • optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

  • **kwargs (Any) – Addition keyword arguments.

Returns:

It should contain at least 3 keys: loss, log_vars,

num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

Return type:

dict

transform_inputs(inputs: List | Tuple)[source]#

Transform inputs for decoder.

Parameters:

inputs (list, tuple) – List (or tuple) of multi-level img features.

Returns:

The transformed inputs.

Return type:

Tensor

val_step(**kwargs)[source]#

Disenable validation step during self-supervised learning.

class otx.algorithms.segmentation.adapters.mmseg.DetConLoss(temperature: float = 0.1, use_replicator_loss: bool = True, ignore_index: int = 255)[source]#

Bases: Module

Modified from deepmind/detcon.

Compute the NCE scores from pairs of predictions and targets. This implements the batched form of the loss described in Section 3.1, Equation 3 in https://arxiv.org/pdf/2103.10957.pdf.

Parameters:
  • temperature – (float) the temperature to use for the NCE loss.

  • use_replicator_loss (bool) – use cross-replica samples.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(pred1, pred2, target1, target2, pind1, pind2, tind1, tind2, local_negatives=True)[source]#

Forward loss.

Parameters:
  • pred1 (Tensor) – (b, num_samples, d) the prediction from first view.

  • pred2 (Tensor) – (b, num_samples, d) the prediction from second view.

  • target1 (Tensor) – (b, num_samples, d) the projection from first view.

  • target2 (Tensor) – (b, num_samples, d) the projection from second view.

  • pind1 (Tensor) – (b, num_samples) mask indices for first view’s prediction.

  • pind2 (Tensor) – (b, num_samples) mask indices for second view’s prediction.

  • tind1 (Tensor) – (b, num_samples) mask indices for first view’s projection.

  • tind2 (Tensor) – (b, num_samples) mask indices for second view’s projection.

  • local_negatives (bool) – whether to include local negatives.

Returns:

A single scalar loss for the XT-NCE objective.

Return type:

dict[str, Tensor]

get_distributed_tensors(target1, target2, batch_size, num_samples, num_features, device)[source]#

Grab tensors across replicas during distributed training.

class otx.algorithms.segmentation.adapters.mmseg.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg=None, norm_eval=False, with_cp=False, zero_init_residual=False, dropout=None, init_cfg=None)[source]#

Bases: BaseModule

Lite-HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions

Parameters:
  • extra (dict) – detailed configuration for each stage of HRNet.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – dictionary to construct and config conv layer.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

Initialize BaseModule, inherited from torch.nn.Module

forward(x)[source]#

Forward function.

init_weights(pretrained=None)[source]#

Initialize the weights in backbone.

Parameters:

pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

train(mode=True)[source]#

Convert the model into training mode.

class otx.algorithms.segmentation.adapters.mmseg.MMOVBackbone(*args, **kwargs)[source]#

Bases: MMOVModel

MMOVBackbone.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(*args, **kwargs)[source]#

Forward.

init_weights(pretrained=None)[source]#

Initialize the weights.

class otx.algorithms.segmentation.adapters.mmseg.MMOVDecodeHead(model_path_or_model: str | Model | None = None, weight_path: str | None = None, inputs: Dict[str, str | List[str]] | None = None, outputs: Dict[str, str | List[str]] | None = None, init_weight: bool = False, verify_shape: bool = True, *args, **kwargs)[source]#

Bases: BaseDecodeHead

MMOVDecodeHead.

Initialize BaseModule, inherited from torch.nn.Module

forward(inputs)[source]#

Forward.

init_weights()[source]#

Init weights.

class otx.algorithms.segmentation.adapters.mmseg.MeanTeacherSegmentor(orig_type, num_iters_per_epoch=None, unsup_weight=0.1, proto_weight=0.7, drop_unrel_pixels_percent=20, semisl_start_epoch=1, proto_head=None, **kwargs)[source]#

Bases: BaseSegmentor

Mean teacher segmentor for semi-supervised learning.

It creates two models and ema from one to the other for consistency loss.

Parameters:
  • orig_type (BaseSegmentor) – original type of segmentor to build student and teacher models

  • num_iters_per_epoch (int) – number of iterations per training epoch.

  • unsup_weight (float) – loss weight for unsupervised part. Default: 0.1

  • proto_weight (float) – loss weight for pixel prototype cross entropy loss. Default: 0.7

  • drop_unrel_pixels_percent (int) – starting precentage of pixels with high entropy to drop from teachers pseudo labels. Default: 20

  • semisl_start_epoch (int) – epoch to start learning with unlabeled images. Default: 1

  • proto_head (dict) – configuration to constract prototype network. Default: None

Initialize BaseModule, inherited from torch.nn.Module

aug_test(imgs, img_metas, **kwargs)[source]#

Aug test.

decode_proto_network(sup_input, gt_semantic_seg, unsup_input=None, pl_from_teacher=None, reweight_unsup=1.0)[source]#

Forward prototype network, compute proto loss.

If there is no unsupervised part, only supervised loss will be computed.

Parameters:
  • sup_input (torch.Tensor) – student output from labeled images

  • gt_semantic_seg (torch.Tensor) – ground truth semantic segmentation label maps

  • unsup_input (torch.Tensor) – student output from unlabeled images. Default: None

  • pl_from_teacher (torch.Tensor) – teacher generated pseudo labels. Default: None

  • reweight_unsup (float) – reweighting coefficient for unsupervised part after filtering high entropy pixels. Default: 1.0

encode_decode(img, img_metas)[source]#

Encode and decode images.

extract_feat(imgs)[source]#

Extract feature.

forward_dummy(img, **kwargs)[source]#

Forward dummy.

forward_train(img, img_metas, gt_semantic_seg, **kwargs)[source]#

Forward train.

Parameters:
  • img (torch.Tensor) – labeled images

  • img_metas (dict) – labeled images meta data

  • gt_semantic_seg (torch.Tensor) – semantic segmentation label maps

  • kwargs (dict) – key arguments with unlabeled components and additional information

generate_pseudo_labels(ul_w_img, ul_img_metas)[source]#

Generate pseudo labels from teacher model, apply filter loss method.

Parameters:
  • ul_w_img (torch.Tensor) – weakly augmented unlabeled images

  • ul_img_metas (dict) – unlabeled images meta data

static load_state_dict_pre_hook(module, state_dict, *args, **kwargs)[source]#

Redirect input state_dict to teacher model.

simple_test(img, img_meta, **kwargs)[source]#

Simple test.

static state_dict_hook(module, state_dict, prefix, *args, **kwargs)[source]#

Redirect student model as output state_dict (teacher as auxilliary).

class otx.algorithms.segmentation.adapters.mmseg.OTXSegDataset(**kwargs)[source]#

Bases: _OTXSegDataset

Wrapper dataset that allows using a OTX dataset to train models.

class otx.algorithms.segmentation.adapters.mmseg.PolyScalarScheduler(start_scale: float, end_scale: float, num_iters: int, power: float = 1.2, by_epoch: bool = False)[source]#

Bases: BaseScalarScheduler

The learning rate changes over time according to a polynomial schedule.

Parameters:
  • start_scale (float) – The initial learning rate scale.

  • end_scale (float) – The final learning rate scale.

  • num_iters (int) – The number of iterations to reach the final learning rate.

  • power (float) – The power of the polynomial schedule.

  • by_epoch (bool) – Whether to use epoch as the unit of iteration.

class otx.algorithms.segmentation.adapters.mmseg.SelfSLMLP(in_channels: int, hid_channels: int, out_channels: int, norm_cfg: Dict[str, Any] = {'type': 'BN1d'}, use_conv: bool = False, with_avg_pool: bool = True)[source]#

Bases: Module

The SelfSLMLP neck: fc/conv-bn-relu-fc/conv.

Parameters:
  • in_channels (int) – The number of feature output channels from backbone.

  • hid_channels (int) – The number of channels for a hidden layer.

  • out_channels (int) – The number of output channels of SelfSLMLP.

  • norm_cfg (dict) – Normalize configuration. Default: dict(type=”BN1d”).

  • use_conv (bool) – Whether using conv instead of fc. Default: False.

  • with_avg_pool (bool) – Whether using average pooling before passing MLP. Default: True.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward SelfSLMLP.

Parameters:

x (Tensor, tuple, list) – Inputs to pass MLP. If a type of the inputs is tuple or list, just use the last index.

Returns:

Features passed SelfSLMLP.

Return type:

Tensor

init_weights(init_linear: str = 'normal', std: float = 0.01, bias: float = 0.0)[source]#

Initialize SelfSLMLP weights.

Parameters:
  • init_linear (str) – Option to initialize weights. Default: “normal”.

  • std (float) – Standard deviation for normal initialization. Default: 0.01.

  • bias (float) – Bias for normal initialization. Default: 0.

class otx.algorithms.segmentation.adapters.mmseg.StepScalarScheduler(scales: List[float], num_iters: List[int], by_epoch: bool = False)[source]#

Bases: BaseScalarScheduler

Step learning rate scheduler.

Example

>>> scheduler = StepScalarScheduler(scales=[1.0, 0.1, 0.01], num_iters=[100, 200])
This means that the learning rate will be 1.0 for the first 100 iterations,
0.1 for the next 200 iterations, and 0.01 for the rest of the iterations.
Parameters:
  • scales (List[int]) – List of learning rate scales.

  • num_iters (List[int]) – A list specifying the count of iterations at each scale.

  • by_epoch (bool) – Whether to use epoch as the unit of iteration.

class otx.algorithms.segmentation.adapters.mmseg.SupConDetConB(backbone: Dict[str, Any], decode_head: Dict[str, Any] | None = None, neck: Dict[str, Any] | None = None, head: Dict[str, Any] | None = None, pretrained: str | None = None, base_momentum: float = 0.996, num_classes: int = 256, num_samples: int = 16, downsample: int = 32, input_transform: str = 'resize_concat', in_index: List[int] | int = [0], align_corners: bool = False, train_cfg: Dict[str, Any] | None = None, test_cfg: Dict[str, Any] | None = None, **kwargs)[source]#

Bases: OTXEncoderDecoder

Apply DetConB as a contrastive part of Supervised Contrastive Learning (https://arxiv.org/abs/2004.11362).

SupCon with DetConB uses ground truth masks instead of pseudo masks to organize features among the same classes.

Parameters:
  • decode_head (dict, optional) – Config dict for module of decode head. Default: None.

  • train_cfg (dict, optional) – Config dict for training. Default: None.

Initialize BaseModule, inherited from torch.nn.Module

forward_train(img, img_metas, gt_semantic_seg, **kwargs)[source]#

Forward function for training.

Parameters:
  • img (Tensor) – Input images.

  • img_metas (list[dict]) – Input information.

  • gt_semantic_seg (Tensor) – Ground truth masks. It is used to organize features among the same classes.

  • **kwargs (Any) – Addition keyword arguments.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]