otx.algorithms.detection.adapters.mmdet.models.heads#

Initial file for mmdetection heads.

Classes

`CrossDatasetDetectorHead`([init_cfg])	Head class for Ignore labels.
`SSDAnchorGeneratorClustered`(strides, widths, ...)	Custom Anchor Generator for SSD.
`CustomATSSHead`(*args[, bg_loss_weight, ...])	CustomATSSHead for OTX template.
`CustomDINOHead`(*args[, dn_cfg])	Head of DINO.
`CustomFCNMaskHead`([num_convs, ...])	Custom FCN Mask Head for fast mask evaluation.
`CustomRetinaHead`(*args[, bg_loss_weight])	CustomRetinaHead class for OTX.
`CustomSSDHead`(*args[, bg_loss_weight, ...])	CustomSSDHead class for OTX.
`CustomRoIHead`([bbox_roi_extractor, ...])	CustomROIHead class for OTX.
`CustomVFNetHead`(*args[, bg_loss_weight])	CustomVFNetHead class for OTX.
`CustomYOLOXHead`(args, *kwargs)	CustomYOLOXHead class for OTX.
`DETRHeadExtension`([init_cfg])	Head of DETR.
`CustomRPNHead`(in_channels[, init_cfg, num_convs])	RPN head.
`CustomATSSHeadTrackingLossDynamics`(*args[, ...])	CustomATSSHead which supports tracking loss dynamics.

class otx.algorithms.detection.adapters.mmdet.models.heads.CrossDatasetDetectorHead(init_cfg=None)[source]#

Bases: BaseDenseHead

Head class for Ignore labels.

Initialize BaseModule, inherited from torch.nn.Module

get_atss_targets(anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list=None, gt_labels_list=None, label_channels=1, unmap_outputs=True)[source]#

Get targets for Detection head.

This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple. However, if the detector’s head loss uses CrossSigmoidFocalLoss, the labels_weights_list consists of (binarized label schema * weights) of batch images

get_fcos_targets(points, gt_bboxes_list, gt_labels_list, img_metas)[source]#

Compute regression, classification and centerss targets for points in multiple images.

Parameters:

points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes_list (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels_list (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
img_metas (list[dict]) – Meta information for the image.

Returns:

concat_lvl_labels (list[Tensor]): Labels of each level. concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level.

Return type:

tuple

get_valid_label_mask(img_metas, all_labels, use_bg=False)[source]#: Getter function valid_label_mask.

vfnet_to_atss_targets(cls_scores, mlvl_points, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#

A wrapper for computing ATSS targets for points in multiple images.

Parameters:

cls_scores (list[Tensor]) – Box iou-aware scores for each scale level with shape (N, num_points * num_classes, H, W).
mlvl_points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | Tensor) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4). Default: None.

Returns:

labels_list (list[Tensor]): Labels of each level. label_weights (Tensor): Label weights of all levels. bbox_targets_list (list[Tensor]): Regression targets of each

level, (l, t, r, b).

bbox_weights (Tensor): Bbox weights of all levels.

Return type:

tuple

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomATSSHead(*args, bg_loss_weight=-1.0, use_qfl=False, qfl_cfg=None, **kwargs)[source]#

Bases: CrossDatasetDetectorHead, ATSSHead

CustomATSSHead for OTX template.

Initialize BaseModule, inherited from torch.nn.Module

forward_single(x, scale)[source]#

Forward feature of a single scale level.

Parameters:

x (Tensor) – Features of a single scale level.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.

Returns:

cls_score (Tensor): Cls scores for a single scale level: the channels number is num_anchors * num_classes.
bbox_pred (Tensor): Box energies / deltas for a single scale: level, the channels number is num_anchors * 4.
centerness (Tensor): Centerness for a single scale level, the: channel number is (N, num_anchors * 1, H, W).

Return type:

tuple

get_targets(anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list=None, gt_labels_list=None, label_channels=1, unmap_outputs=True)[source]#

Get targets for Detection head.

loss(cls_scores, bbox_preds, centernesses, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#

Compute losses of the head.

Parameters:

cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor] | None) – specify which bounding boxes can be ignored when computing the loss.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_single(anchors, cls_score, bbox_pred, centerness, labels, label_weights, bbox_targets, valid_label_mask, num_total_samples)[source]#

Compute loss of a single scale level.

Parameters:

anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
centerness (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * num_classes, H, W)
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor wight shape (N, num_total_anchors, 4).
valid_label_mask (Tensor) – Label mask for consideration of ignored label with shape (N, num_total_anchors, 1).
num_total_samples (int) – Number of positive samples that is reduced over all GPUs.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomATSSHeadTrackingLossDynamics(*args, bg_loss_weight=-1, use_qfl=False, qfl_cfg=None, **kwargs)[source]#

Bases: TrackingLossDynamicsMixIn, CustomATSSHead

CustomATSSHead which supports tracking loss dynamics.

Initialize BaseModule, inherited from torch.nn.Module

get_targets(anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list=None, gt_labels_list=None, label_channels=1, unmap_outputs=True)[source]#: Get targets for Detection head.

loss(cls_scores, bbox_preds, centernesses, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#

Compute losses of the head.

Parameters:

cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor] | None) – specify which bounding boxes can be ignored when computing the loss.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_single(anchors, cls_score, bbox_pred, centerness, labels, label_weights, bbox_targets, valid_label_mask, num_total_samples)[source]#

Compute loss of a single scale level.

Parameters:

anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
centerness (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * num_classes, H, W)
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor wight shape (N, num_total_anchors, 4).
valid_label_mask (Tensor) – Label mask for consideration of ignored label with shape (N, num_total_anchors, 1).
num_total_samples (int) – Number of positive samples that is reduced over all GPUs.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomDINOHead(*args, dn_cfg: Config | None = None, **kwargs)[source]#

Bases: DeformableDETRHead, DETRHeadExtension

Head of DINO.

Based on detr_head.py and deformable_detr.py in mmdet2.x, some functions from dino_head.py in mmdet3.x are added. Forward structure:

Training: self.forward_train -> self.forward_transformer -> self.forward -> self.loss

Inference: self.simple_test_bboxes -> self.forward_transformer -> self.forward -> self.get_bboxes

Initialize BaseModule, inherited from torch.nn.Module

forward(hidden_states: Tensor, references: List[Tensor])[source]#

Forward function.

Original implementation: forward function of deformable_detr_head.py in mmdet3.x What’s changed: None

Parameters:

hidden_states (Tensor) – Hidden states output from each decoder layer, has shape (num_decoder_layers, bs, num_queries, dim).
references (list[Tensor]) – List of the reference from the decoder. The first reference is the init_reference (initial) and the other num_decoder_layers(6) references are inter_references (intermediate). The init_reference has shape (bs, num_queries, 4) when as_two_stage of the detector is True, otherwise (bs, num_queries, 2). Each inter_reference has shape (bs, num_queries, 4) when with_box_refine of the detector is True, otherwise (bs, num_queries, 2). The coordinates are arranged as (cx, cy) when the last dimension is 2, and (cx, cy, w, h) when it is 4.

Returns:

results of head containing the following tensor.

all_layers_outputs_classes (Tensor): Outputs from the classification head, has shape (num_decoder_layers, bs, num_queries, cls_out_channels).
all_layers_outputs_coords (Tensor): Sigmoid outputs from the regression head with normalized coordinate format (cx, cy, w, h), has shape (num_decoder_layers, bs, num_queries, 4) with the last dimension arranged as (cx, cy, w, h).

Return type:

tuple[Tensor]

forward_train(x: Tuple[Tensor], img_metas: List[Dict[str, Any]], gt_bboxes: List[Tensor], gt_labels: List[Tensor] | None = None, gt_bboxes_ignore: List[Tensor] | None = None, proposal_cfg: Config | None = None)[source]#

Forward function for training mode.

Origin impelmentation: forward_train function of detr_head.py in mmdet2.x What’s changed: Divided self.forward into self.forward_transformer + self.forward. This kind of structure is from mmdet3.x.

Parameters:

x (list[Tensor]) – Features from backbone.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes (List[Tensor]) – Ground truth bboxes of the image, shape (num_gts, 4).
gt_labels (List[Tensor]) – Ground truth labels of each box, shape (num_gts,).
gt_bboxes_ignore (List[Tensor]) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).
proposal_cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

forward_transformer(mlvl_feats: Tuple[Tensor], gt_bboxes: List[Tensor] | None, gt_labels: List[Tensor] | None, img_metas: List[Dict[str, Any]])[source]#

Transformers’s forward function.

Origin implementation: forward function of deformable_detr_head.py in mmdet2.x What’s changed: Original implementation has post-processing process after getting outputs from

self.transformer. However, this function directly return outputs from self.transformer

Parameters:

mlvl_feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor with shape (N, C, H, W).
gt_bboxes (List[Tensor | None]) – List of ground truth bboxes. When model is evaluated, it will be list of None.
gt_labels (List[Tensor | None]) – List of ground truth labels. When model is evaluated, it will be list of None.
img_metas (list[dict]) – List of image information.

Returns:

Outputs from the classification head, shape [nb_dec, bs, num_query, cls_out_channels]. Note cls_out_channels should includes background. all_bbox_preds (Tensor): Sigmoid outputs from the regression head with normalized coordinate format (cx, cy, w, h). Shape [nb_dec, bs, num_query, 4]. enc_outputs_class (Tensor): The score of each point on encode feature map, has shape (N, h*w, num_class). Only when as_two_stage is True it would be returned, otherwise None would be returned. enc_outputs_coord (Tensor): The proposal generate from the encode feature map, has shape (N, h*w, 4). Only when as_two_stage is True it would be returned, otherwise None would be returned. dn_meta (Dict[str, int]): The dictionary saves information about

group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.

Return type:

all_cls_scores (Tensor)

get_dn_targets(batch_gt_instances: List[Config], batch_img_metas: List[Dict], dn_meta: Dict[str, int]) → tuple[source]#

Get targets in denoising part for a batch of images.

Original implementation: get_dn_targets function of dino_head.py in mmdet3.x What’s changed: None

Parameters:

batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.

Returns:

a tuple containing the following targets.

labels_list (list[Tensor]): Labels for all images.
label_weights_list (list[Tensor]): Label weights for all images.
bbox_targets_list (list[Tensor]): BBox targets for all images.
bbox_weights_list (list[Tensor]): BBox weights for all images.
num_total_pos (int): Number of positive samples in all images.
num_total_neg (int): Number of negative samples in all images.

Return type:

tuple

loss(hidden_states: Tensor, references: List[Tensor], enc_outputs_class: Tensor, enc_outputs_coord: Tensor, dn_meta: Dict[str, int], batch_data_samples: List[Config]) → dict[source]#

Perform forward propagation and loss calculation.

Original implementation: loss function of dino_head.py in mmdet3.x What’s changed: Change the name of function of loss_by_feat to loss_by_feat_two_stage since

there are changes in function input from parent’s implementation.

Parameters:

hidden_states (Tensor) – Hidden states output from each decoder layer, has shape (num_decoder_layers, bs, num_queries_total, dim), where num_queries_total is the sum of num_denoising_queries and num_matching_queries when self.training is True, else num_matching_queries.
references (list[Tensor]) – List of the reference from the decoder. The first reference is the init_reference (initial) and the other num_decoder_layers(6) references are inter_references (intermediate). The init_reference has shape (bs, num_queries_total, 4) and each inter_reference has shape (bs, num_queries, 4) with the last dimension arranged as (cx, cy, w, h).
enc_outputs_class (Tensor) – The score of each point on encode feature map, has shape (bs, num_feat_points, cls_out_channels).
enc_outputs_coord (Tensor) – The proposal generate from the encode feature map, has shape (bs, num_feat_points, 4) with the last dimension arranged as (cx, cy, w, h).
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
batch_data_samples (List[Config]) – This is same with batch_data_samples in mmdet3.x It contains meta_info(==img_metas) and gt_instances(==(gt_bboxes, gt_labels))

Returns:

A dictionary of loss components.

Return type:

dict

loss_by_feat_two_stage(all_layers_cls_scores: Tensor, all_layers_bbox_preds: Tensor, enc_cls_scores: Tensor, enc_bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict], dn_meta: Dict[str, int], batch_gt_instances_ignore=None) → Dict[str, Tensor][source]#

Loss function.

Original implementation: loss_by_feat function of dino_head.py in mmdet3.x What’s changed: Name of function is changed. Parent’s loss_by_feat function has different inputs.

Parameters:

all_layers_cls_scores (Tensor) – Classification scores of all decoder layers, has shape (num_decoder_layers, bs, num_queries_total, cls_out_channels), where num_queries_total is the sum of num_denoising_queries and num_matching_queries.
all_layers_bbox_preds (Tensor) – Regression outputs of all decoder layers. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_queries_total, 4).
enc_cls_scores (Tensor) – The score of each point on encode feature map, has shape (bs, num_feat_points, cls_out_channels).
enc_bbox_preds (Tensor) – The proposal generate from the encode feature map, has shape (bs, num_feat_points, 4) with the last dimension arranged as (cx, cy, w, h).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_dn(all_layers_denoising_cls_scores: Tensor, all_layers_denoising_bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict], dn_meta: Dict[str, int]) → Tuple[List[Tensor], ...][source]#

Calculate denoising loss.

Original implementation: loss_dn function of dino_head.py in mmdet3.x What’s changed: None

Parameters:

all_layers_denoising_cls_scores (Tensor) – Classification scores of all decoder layers in denoising part, has shape ( num_decoder_layers, bs, num_denoising_queries, cls_out_channels).
all_layers_denoising_bbox_preds (Tensor) – Regression outputs of all decoder layers in denoising part. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_denoising_queries, 4).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.

Returns:

The loss_dn_cls, loss_dn_bbox, and loss_dn_iou of each decoder layers.

Return type:

Tuple[List[Tensor]]

simple_test_bboxes(feats: Tuple[Tensor], img_metas: List[Dict[str, Any]], rescale=False)[source]#

Test det bboxes without test-time augmentation.

Original implementation: simple_test_bboxes funciton of detr_head.py in mmdet2.x What’s changed: self.forward function is divided into self.forward_transformer and self.forward function.

This changes is from mmdet3.x

Parameters:

feats (tuple[torch.Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
img_metas (list[dict]) – List of image information.
rescale (bool, optional) – Whether to rescale the results. Defaults to False.

Returns:

Each item in result_list is 2-tuple.: The first item is bboxes with shape (n, 5), where 5 represent (tl_x, tl_y, br_x, br_y, score). The shape of the second tensor in the tuple is labels with shape (n,)

Return type:

list[tuple[Tensor, Tensor]]

static split_outputs(all_layers_cls_scores: Tensor, all_layers_bbox_preds: Tensor, dn_meta: Dict[str, int]) → Tuple[Tensor, ...][source]#

Split outputs of the denoising part and the matching part.

Original implementation: split_outputs function of dino_head.py in mmdet3.x What’s changed: None

For the total outputs of num_queries_total length, the former num_denoising_queries outputs are from denoising queries, and the rest num_matching_queries ones are from matching queries, where num_queries_total is the sum of num_denoising_queries and num_matching_queries.

Parameters:

all_layers_cls_scores (Tensor) – Classification scores of all decoder layers, has shape (num_decoder_layers, bs, num_queries_total, cls_out_channels).
all_layers_bbox_preds (Tensor) – Regression outputs of all decoder layers. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_queries_total, 4).
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’.

Returns:

a tuple containing the following outputs.

all_layers_matching_cls_scores (Tensor): Classification scores of all decoder layers in matching part, has shape (num_decoder_layers, bs, num_matching_queries, cls_out_channels).
all_layers_matching_bbox_preds (Tensor): Regression outputs of all decoder layers in matching part. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_matching_queries, 4).
all_layers_denoising_cls_scores (Tensor): Classification scores of all decoder layers in denoising part, has shape (num_decoder_layers, bs, num_denoising_queries, cls_out_channels).
all_layers_denoising_bbox_preds (Tensor): Regression outputs of all decoder layers in denoising part. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_denoising_queries, 4).

Return type:

Tuple[Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomFCNMaskHead(num_convs=4, roi_feat_size=14, in_channels=256, conv_kernel_size=3, conv_out_channels=256, num_classes=80, class_agnostic=False, upsample_cfg={'scale_factor': 2, 'type': 'deconv'}, conv_cfg=None, norm_cfg=None, predictor_cfg={'type': 'Conv'}, loss_mask={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_mask': True}, init_cfg=None)[source]#

Bases: FCNMaskHead

Custom FCN Mask Head for fast mask evaluation.

Initialize BaseModule, inherited from torch.nn.Module

get_scaled_seg_masks(*args, **kwargs)[source]#: Original method “get_seg_mask” from FCNMaskHead. Used in Semi-SL algorithm.

get_seg_masks(mask_pred, det_bboxes, det_labels, rcnn_test_cfg, ori_shape, scale_factor, rescale)[source]#

Get segmentation masks from mask_pred and bboxes.

The original FCNMaskHead.get_seg_masks grid sampled 28 x 28 masks to the original image resolution. As a result, the resized masks occupy a large amount of memory and slow down the inference. This method directly returns 28 x 28 masks and resize to bounding boxes size in post-processing step. Doing so can save memory and speed up the inference.

Parameters:

mask_pred (Tensor or ndarray) – shape (n, #class, h, w). For single-scale testing, mask_pred is the direct output of model, whose type is Tensor, while for multi-scale testing, it will be converted to numpy array outside of this method.
det_bboxes (Tensor) – shape (n, 4/5)
det_labels (Tensor) – shape (n, )
rcnn_test_cfg (dict) – rcnn testing config
ori_shape (Tuple) – original image height and width, shape (2,)
scale_factor (ndarray | Tensor) – If rescale is True, box coordinates are divided by this scale factor to fit ori_shape.
rescale (bool) – If True, the resulting masks will be rescaled to ori_shape.

Returns:

encoded masks. The c-th item in the outer list: corresponds to the c-th class. Given the c-th outer list, the i-th item in that inner list is the mask for the i-th box with class label c.

Return type:

list[list]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomRPNHead(in_channels, init_cfg={'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'}, num_convs=1, **kwargs)[source]#

Bases: RPNHead

RPN head.

Parameters:

in_channels (int) – Number of channels in the input feature map.
init_cfg (dict or list[dict], optional) – Initialization config dict.
num_convs (int) – Number of convolution layers in the head. Default 1.

Initialize BaseModule, inherited from torch.nn.Module

forward_single(x)[source]#: Forward feature map of a single scale level.

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomRetinaHead(*args, bg_loss_weight=-1.0, **kwargs)[source]#

Bases: RetinaHead

CustomRetinaHead class for OTX.

Initialize BaseModule, inherited from torch.nn.Module

loss_single(cls_score, bbox_pred, anchors, labels, label_weights, bbox_targets, bbox_weights, num_total_samples)[source]#

Compute loss of a single scale level.

Parameters:

cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor wight shape (N, num_total_anchors, 4).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).
num_total_samples (int) – If sampling, num total samples equal to the number of total anchors; Otherwise, it is the number of positive anchors.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomRoIHead(bbox_roi_extractor=None, bbox_head=None, mask_roi_extractor=None, mask_head=None, shared_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]#

Bases: StandardRoIHead

CustomROIHead class for OTX.

Initialize BaseModule, inherited from torch.nn.Module

init_bbox_head(bbox_roi_extractor, bbox_head)[source]#: Initialize bbox_head.

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomSSDHead(*args, bg_loss_weight=-1.0, loss_cls=None, loss_balancing=False, **kwargs)[source]#

Bases: SSDHead

CustomSSDHead class for OTX.

Initialize BaseModule, inherited from torch.nn.Module

forward(feats)[source]#

Forward features from the upstream network.

Parameters:

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns:

cls_scores (list[Tensor]): Classification scores for all scale: levels, each is a 4D-tensor, the channels number is num_anchors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale: levels, each is a 4D-tensor, the channels number is num_anchors * 4.

Return type:

tuple

loss(cls_scores, bbox_preds, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#: Loss function.

loss_single(cls_score, bbox_pred, anchor, labels, label_weights, bbox_targets, bbox_weights, num_total_samples)[source]#

Compute loss of a single image.

Parameters:

cls_score (Tensor) – Box scores for eachimage Has shape (num_total_anchors, num_classes).
bbox_pred (Tensor) – Box energies / deltas for each image level with shape (num_total_anchors, 4).
anchor (Tensor) – Box reference for each scale level with shape (num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (num_total_anchors,).
label_weights (Tensor) – Label weights of each anchor with shape (num_total_anchors,)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (num_total_anchors, 4).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (num_total_anchors, 4).
num_total_samples (int) – If sampling, num total samples equal to the number of total anchors; Otherwise, it is the number of positive anchors.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomVFNetHead(*args, bg_loss_weight=-1.0, **kwargs)[source]#

Bases: CrossDatasetDetectorHead, VFNetHead

CustomVFNetHead class for OTX.

Initialize BaseModule, inherited from torch.nn.Module

get_targets(cls_scores, mlvl_points, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore)[source]#

A wrapper for computing ATSS and FCOS targets for points in multiple images.

Parameters:

cls_scores (list[Tensor]) – Box iou-aware scores for each scale level with shape (N, num_points * num_classes, H, W).
mlvl_points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | Tensor) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).

Returns:

labels_list (list[Tensor]): Labels of each level. label_weights (Tensor/None): Label weights of all levels. bbox_targets_list (list[Tensor]): Regression targets of each

level, (l, t, r, b).

bbox_weights (Tensor/None): Bbox weights of all levels.

Return type:

tuple

loss(cls_scores, bbox_preds, bbox_preds_refine, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#

Compute loss of the head.

Parameters:

cls_scores (list[Tensor]) – Box iou-aware scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box offsets for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
bbox_preds_refine (list[Tensor]) – Refined Box offsets for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss. Default: None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.CustomYOLOXHead(*args, **kwargs)[source]#

Bases: YOLOXHead

CustomYOLOXHead class for OTX.

Initialize BaseModule, inherited from torch.nn.Module

forward_single(x, cls_convs, reg_convs, conv_cls, conv_reg, conv_obj)[source]#: Forward feature of a single scale level.

loss(cls_scores, bbox_preds, objectnesses, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#

Compute loss of the head.

Parameters:

cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.

class otx.algorithms.detection.adapters.mmdet.models.heads.DETRHeadExtension(init_cfg: dict | None = None)[source]#

Bases: BaseModule

Head of DETR. DETR:End-to-End Object Detection with Transformers.

Origin implementation: DETRHead of detr_head.py in mmdet3.x What’s changed: Change data type of batch_gt_instances from InstanceList to List[Config].

Since InstanceList is a new data type from mmdet3.x, List[Config] will replace it.

Initialize BaseModule, inherited from torch.nn.Module

loss_by_feat(all_layers_cls_scores: Tensor, all_layers_bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict], batch_gt_instances_ignore=None) → Dict[str, Tensor][source]#

Loss function.

Only outputs from the last feature level are used for computing losses by default.

Parameters:

all_layers_cls_scores (Tensor) – Classification outputs of each decoder layers. Each is a 4D-tensor, has shape (num_decoder_layers, bs, num_queries, cls_out_channels).
all_layers_bbox_preds (Tensor) – Sigmoid regression outputs of each decoder layers. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and shape (num_decoder_layers, bs, num_queries, 4).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_by_feat_single(cls_scores: Tensor, bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict]) → Tuple[Tensor, Tensor, Tensor][source]#

Loss function for outputs from a single decoder layer of a single feature level.

Parameters:

cls_scores (Tensor) – Box score logits from a single decoder layer for all images, has shape (bs, num_queries, cls_out_channels).
bbox_preds (Tensor) – Sigmoid outputs from a single decoder layer for all images, with normalized coordinate (cx, cy, w, h) and shape (bs, num_queries, 4).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

Returns:

A tuple including loss_cls, loss_box and loss_iou.

Return type:

Tuple[Tensor]

class otx.algorithms.detection.adapters.mmdet.models.heads.SSDAnchorGeneratorClustered(strides, widths, heights, reclustering_anchors=False)[source]#

Bases: AnchorGenerator

Custom Anchor Generator for SSD.

gen_base_anchors()[source]#: Generate base anchor for SSD.

gen_single_level_base_anchors(ws, hs, center)[source]#: Generate single_level_base_anchors for SSD.