otx.algorithms.detection.adapters.mmdet.models.heads#
Initial file for mmdetection heads.
Classes
|
Head class for Ignore labels. |
|
Custom Anchor Generator for SSD. |
|
CustomATSSHead for OTX template. |
|
Head of DINO. |
|
Custom FCN Mask Head for fast mask evaluation. |
|
CustomRetinaHead class for OTX. |
|
CustomSSDHead class for OTX. |
|
CustomROIHead class for OTX. |
|
CustomVFNetHead class for OTX. |
|
CustomYOLOXHead class for OTX. |
|
Head of DETR. |
|
RPN head. |
|
CustomATSSHead which supports tracking loss dynamics. |
- class otx.algorithms.detection.adapters.mmdet.models.heads.CrossDatasetDetectorHead(init_cfg=None)[source]#
Bases:
BaseDenseHead
Head class for Ignore labels.
Initialize BaseModule, inherited from torch.nn.Module
- get_atss_targets(anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list=None, gt_labels_list=None, label_channels=1, unmap_outputs=True)[source]#
Get targets for Detection head.
This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple. However, if the detector’s head loss uses CrossSigmoidFocalLoss, the labels_weights_list consists of (binarized label schema * weights) of batch images
- get_fcos_targets(points, gt_bboxes_list, gt_labels_list, img_metas)[source]#
Compute regression, classification and centerss targets for points in multiple images.
- Parameters:
- Returns:
concat_lvl_labels (list[Tensor]): Labels of each level. concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level.
- Return type:
- get_valid_label_mask(img_metas, all_labels, use_bg=False)[source]#
Getter function valid_label_mask.
- vfnet_to_atss_targets(cls_scores, mlvl_points, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#
A wrapper for computing ATSS targets for points in multiple images.
- Parameters:
cls_scores (list[Tensor]) – Box iou-aware scores for each scale level with shape (N, num_points * num_classes, H, W).
mlvl_points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | Tensor) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4). Default: None.
- Returns:
labels_list (list[Tensor]): Labels of each level. label_weights (Tensor): Label weights of all levels. bbox_targets_list (list[Tensor]): Regression targets of each
level, (l, t, r, b).
bbox_weights (Tensor): Bbox weights of all levels.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomATSSHead(*args, bg_loss_weight=-1.0, use_qfl=False, qfl_cfg=None, **kwargs)[source]#
Bases:
CrossDatasetDetectorHead
,ATSSHead
CustomATSSHead for OTX template.
Initialize BaseModule, inherited from torch.nn.Module
- forward_single(x, scale)[source]#
Forward feature of a single scale level.
- Parameters:
x (Tensor) – Features of a single scale level.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.
- Returns:
- cls_score (Tensor): Cls scores for a single scale level
the channels number is num_anchors * num_classes.
- bbox_pred (Tensor): Box energies / deltas for a single scale
level, the channels number is num_anchors * 4.
- centerness (Tensor): Centerness for a single scale level, the
channel number is (N, num_anchors * 1, H, W).
- Return type:
- get_targets(anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list=None, gt_labels_list=None, label_channels=1, unmap_outputs=True)[source]#
Get targets for Detection head.
This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple. However, if the detector’s head loss uses CrossSigmoidFocalLoss, the labels_weights_list consists of (binarized label schema * weights) of batch images
- loss(cls_scores, bbox_preds, centernesses, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#
Compute losses of the head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor] | None) – specify which bounding boxes can be ignored when computing the loss.
- Returns:
A dictionary of loss components.
- Return type:
- loss_single(anchors, cls_score, bbox_pred, centerness, labels, label_weights, bbox_targets, valid_label_mask, num_total_samples)[source]#
Compute loss of a single scale level.
- Parameters:
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
centerness (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * num_classes, H, W)
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor wight shape (N, num_total_anchors, 4).
valid_label_mask (Tensor) – Label mask for consideration of ignored label with shape (N, num_total_anchors, 1).
num_total_samples (int) – Number of positive samples that is reduced over all GPUs.
- Returns:
A dictionary of loss components.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomATSSHeadTrackingLossDynamics(*args, bg_loss_weight=-1, use_qfl=False, qfl_cfg=None, **kwargs)[source]#
Bases:
TrackingLossDynamicsMixIn
,CustomATSSHead
CustomATSSHead which supports tracking loss dynamics.
Initialize BaseModule, inherited from torch.nn.Module
- get_targets(anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list=None, gt_labels_list=None, label_channels=1, unmap_outputs=True)[source]#
Get targets for Detection head.
- loss(cls_scores, bbox_preds, centernesses, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#
Compute losses of the head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor] | None) – specify which bounding boxes can be ignored when computing the loss.
- Returns:
A dictionary of loss components.
- Return type:
- loss_single(anchors, cls_score, bbox_pred, centerness, labels, label_weights, bbox_targets, valid_label_mask, num_total_samples)[source]#
Compute loss of a single scale level.
- Parameters:
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
centerness (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * num_classes, H, W)
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor wight shape (N, num_total_anchors, 4).
valid_label_mask (Tensor) – Label mask for consideration of ignored label with shape (N, num_total_anchors, 1).
num_total_samples (int) – Number of positive samples that is reduced over all GPUs.
- Returns:
A dictionary of loss components.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomDINOHead(*args, dn_cfg: Config | None = None, **kwargs)[source]#
Bases:
DeformableDETRHead
,DETRHeadExtension
Head of DINO.
Based on detr_head.py and deformable_detr.py in mmdet2.x, some functions from dino_head.py in mmdet3.x are added. Forward structure:
Training: self.forward_train -> self.forward_transformer -> self.forward -> self.loss
Inference: self.simple_test_bboxes -> self.forward_transformer -> self.forward -> self.get_bboxes
Initialize BaseModule, inherited from torch.nn.Module
- forward(hidden_states: Tensor, references: List[Tensor])[source]#
Forward function.
Original implementation: forward function of deformable_detr_head.py in mmdet3.x What’s changed: None
- Parameters:
hidden_states (Tensor) – Hidden states output from each decoder layer, has shape (num_decoder_layers, bs, num_queries, dim).
references (list[Tensor]) – List of the reference from the decoder. The first reference is the init_reference (initial) and the other num_decoder_layers(6) references are inter_references (intermediate). The init_reference has shape (bs, num_queries, 4) when as_two_stage of the detector is True, otherwise (bs, num_queries, 2). Each inter_reference has shape (bs, num_queries, 4) when with_box_refine of the detector is True, otherwise (bs, num_queries, 2). The coordinates are arranged as (cx, cy) when the last dimension is 2, and (cx, cy, w, h) when it is 4.
- Returns:
results of head containing the following tensor.
all_layers_outputs_classes (Tensor): Outputs from the classification head, has shape (num_decoder_layers, bs, num_queries, cls_out_channels).
all_layers_outputs_coords (Tensor): Sigmoid outputs from the regression head with normalized coordinate format (cx, cy, w, h), has shape (num_decoder_layers, bs, num_queries, 4) with the last dimension arranged as (cx, cy, w, h).
- Return type:
tuple[Tensor]
- forward_train(x: Tuple[Tensor], img_metas: List[Dict[str, Any]], gt_bboxes: List[Tensor], gt_labels: List[Tensor] | None = None, gt_bboxes_ignore: List[Tensor] | None = None, proposal_cfg: Config | None = None)[source]#
Forward function for training mode.
Origin impelmentation: forward_train function of detr_head.py in mmdet2.x What’s changed: Divided self.forward into self.forward_transformer + self.forward. This kind of structure is from mmdet3.x.
- Parameters:
x (list[Tensor]) – Features from backbone.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes (List[Tensor]) – Ground truth bboxes of the image, shape (num_gts, 4).
gt_labels (List[Tensor]) – Ground truth labels of each box, shape (num_gts,).
gt_bboxes_ignore (List[Tensor]) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).
proposal_cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used.
- Returns:
A dictionary of loss components.
- Return type:
- forward_transformer(mlvl_feats: Tuple[Tensor], gt_bboxes: List[Tensor] | None, gt_labels: List[Tensor] | None, img_metas: List[Dict[str, Any]])[source]#
Transformers’s forward function.
Origin implementation: forward function of deformable_detr_head.py in mmdet2.x What’s changed: Original implementation has post-processing process after getting outputs from
self.transformer. However, this function directly return outputs from self.transformer
- Parameters:
mlvl_feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor with shape (N, C, H, W).
gt_bboxes (List[Tensor | None]) – List of ground truth bboxes. When model is evaluated, it will be list of None.
gt_labels (List[Tensor | None]) – List of ground truth labels. When model is evaluated, it will be list of None.
- Returns:
Outputs from the classification head, shape [nb_dec, bs, num_query, cls_out_channels]. Note cls_out_channels should includes background. all_bbox_preds (Tensor): Sigmoid outputs from the regression head with normalized coordinate format (cx, cy, w, h). Shape [nb_dec, bs, num_query, 4]. enc_outputs_class (Tensor): The score of each point on encode feature map, has shape (N, h*w, num_class). Only when as_two_stage is True it would be returned, otherwise None would be returned. enc_outputs_coord (Tensor): The proposal generate from the encode feature map, has shape (N, h*w, 4). Only when as_two_stage is True it would be returned, otherwise None would be returned. dn_meta (Dict[str, int]): The dictionary saves information about
group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
- Return type:
all_cls_scores (Tensor)
- get_dn_targets(batch_gt_instances: List[Config], batch_img_metas: List[Dict], dn_meta: Dict[str, int]) tuple [source]#
Get targets in denoising part for a batch of images.
Original implementation: get_dn_targets function of dino_head.py in mmdet3.x What’s changed: None
- Parameters:
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
- Returns:
a tuple containing the following targets.
labels_list (list[Tensor]): Labels for all images.
label_weights_list (list[Tensor]): Label weights for all images.
bbox_targets_list (list[Tensor]): BBox targets for all images.
bbox_weights_list (list[Tensor]): BBox weights for all images.
num_total_pos (int): Number of positive samples in all images.
num_total_neg (int): Number of negative samples in all images.
- Return type:
- loss(hidden_states: Tensor, references: List[Tensor], enc_outputs_class: Tensor, enc_outputs_coord: Tensor, dn_meta: Dict[str, int], batch_data_samples: List[Config]) dict [source]#
Perform forward propagation and loss calculation.
Original implementation: loss function of dino_head.py in mmdet3.x What’s changed: Change the name of function of loss_by_feat to loss_by_feat_two_stage since
there are changes in function input from parent’s implementation.
- Parameters:
hidden_states (Tensor) – Hidden states output from each decoder layer, has shape (num_decoder_layers, bs, num_queries_total, dim), where num_queries_total is the sum of num_denoising_queries and num_matching_queries when self.training is True, else num_matching_queries.
references (list[Tensor]) – List of the reference from the decoder. The first reference is the init_reference (initial) and the other num_decoder_layers(6) references are inter_references (intermediate). The init_reference has shape (bs, num_queries_total, 4) and each inter_reference has shape (bs, num_queries, 4) with the last dimension arranged as (cx, cy, w, h).
enc_outputs_class (Tensor) – The score of each point on encode feature map, has shape (bs, num_feat_points, cls_out_channels).
enc_outputs_coord (Tensor) – The proposal generate from the encode feature map, has shape (bs, num_feat_points, 4) with the last dimension arranged as (cx, cy, w, h).
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
batch_data_samples (List[Config]) – This is same with batch_data_samples in mmdet3.x It contains meta_info(==img_metas) and gt_instances(==(gt_bboxes, gt_labels))
- Returns:
A dictionary of loss components.
- Return type:
- loss_by_feat_two_stage(all_layers_cls_scores: Tensor, all_layers_bbox_preds: Tensor, enc_cls_scores: Tensor, enc_bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict], dn_meta: Dict[str, int], batch_gt_instances_ignore=None) Dict[str, Tensor] [source]#
Loss function.
Original implementation: loss_by_feat function of dino_head.py in mmdet3.x What’s changed: Name of function is changed. Parent’s loss_by_feat function has different inputs.
- Parameters:
all_layers_cls_scores (Tensor) – Classification scores of all decoder layers, has shape (num_decoder_layers, bs, num_queries_total, cls_out_channels), where num_queries_total is the sum of num_denoising_queries and num_matching_queries.
all_layers_bbox_preds (Tensor) – Regression outputs of all decoder layers. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_queries_total, 4).
enc_cls_scores (Tensor) – The score of each point on encode feature map, has shape (bs, num_feat_points, cls_out_channels).
enc_bbox_preds (Tensor) – The proposal generate from the encode feature map, has shape (bs, num_feat_points, 4) with the last dimension arranged as (cx, cy, w, h).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
- loss_dn(all_layers_denoising_cls_scores: Tensor, all_layers_denoising_bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict], dn_meta: Dict[str, int]) Tuple[List[Tensor], ...] [source]#
Calculate denoising loss.
Original implementation: loss_dn function of dino_head.py in mmdet3.x What’s changed: None
- Parameters:
all_layers_denoising_cls_scores (Tensor) – Classification scores of all decoder layers in denoising part, has shape ( num_decoder_layers, bs, num_denoising_queries, cls_out_channels).
all_layers_denoising_bbox_preds (Tensor) – Regression outputs of all decoder layers in denoising part. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_denoising_queries, 4).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’. It will be used for split outputs of denoising and matching parts and loss calculation.
- Returns:
The loss_dn_cls, loss_dn_bbox, and loss_dn_iou of each decoder layers.
- Return type:
Tuple[List[Tensor]]
- simple_test_bboxes(feats: Tuple[Tensor], img_metas: List[Dict[str, Any]], rescale=False)[source]#
Test det bboxes without test-time augmentation.
Original implementation: simple_test_bboxes funciton of detr_head.py in mmdet2.x What’s changed: self.forward function is divided into self.forward_transformer and self.forward function.
This changes is from mmdet3.x
- Parameters:
- Returns:
- Each item in result_list is 2-tuple.
The first item is
bboxes
with shape (n, 5), where 5 represent (tl_x, tl_y, br_x, br_y, score). The shape of the second tensor in the tuple islabels
with shape (n,)
- Return type:
- static split_outputs(all_layers_cls_scores: Tensor, all_layers_bbox_preds: Tensor, dn_meta: Dict[str, int]) Tuple[Tensor, ...] [source]#
Split outputs of the denoising part and the matching part.
Original implementation: split_outputs function of dino_head.py in mmdet3.x What’s changed: None
For the total outputs of num_queries_total length, the former num_denoising_queries outputs are from denoising queries, and the rest num_matching_queries ones are from matching queries, where num_queries_total is the sum of num_denoising_queries and num_matching_queries.
- Parameters:
all_layers_cls_scores (Tensor) – Classification scores of all decoder layers, has shape (num_decoder_layers, bs, num_queries_total, cls_out_channels).
all_layers_bbox_preds (Tensor) – Regression outputs of all decoder layers. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_queries_total, 4).
dn_meta (Dict[str, int]) – The dictionary saves information about group collation, including ‘num_denoising_queries’ and ‘num_denoising_groups’.
- Returns:
a tuple containing the following outputs.
all_layers_matching_cls_scores (Tensor): Classification scores of all decoder layers in matching part, has shape (num_decoder_layers, bs, num_matching_queries, cls_out_channels).
all_layers_matching_bbox_preds (Tensor): Regression outputs of all decoder layers in matching part. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_matching_queries, 4).
all_layers_denoising_cls_scores (Tensor): Classification scores of all decoder layers in denoising part, has shape (num_decoder_layers, bs, num_denoising_queries, cls_out_channels).
all_layers_denoising_bbox_preds (Tensor): Regression outputs of all decoder layers in denoising part. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and has shape (num_decoder_layers, bs, num_denoising_queries, 4).
- Return type:
Tuple[Tensor]
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomFCNMaskHead(num_convs=4, roi_feat_size=14, in_channels=256, conv_kernel_size=3, conv_out_channels=256, num_classes=80, class_agnostic=False, upsample_cfg={'scale_factor': 2, 'type': 'deconv'}, conv_cfg=None, norm_cfg=None, predictor_cfg={'type': 'Conv'}, loss_mask={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_mask': True}, init_cfg=None)[source]#
Bases:
FCNMaskHead
Custom FCN Mask Head for fast mask evaluation.
Initialize BaseModule, inherited from torch.nn.Module
- get_scaled_seg_masks(*args, **kwargs)[source]#
Original method “get_seg_mask” from FCNMaskHead. Used in Semi-SL algorithm.
- get_seg_masks(mask_pred, det_bboxes, det_labels, rcnn_test_cfg, ori_shape, scale_factor, rescale)[source]#
Get segmentation masks from mask_pred and bboxes.
The original FCNMaskHead.get_seg_masks grid sampled 28 x 28 masks to the original image resolution. As a result, the resized masks occupy a large amount of memory and slow down the inference. This method directly returns 28 x 28 masks and resize to bounding boxes size in post-processing step. Doing so can save memory and speed up the inference.
- Parameters:
mask_pred (Tensor or ndarray) – shape (n, #class, h, w). For single-scale testing, mask_pred is the direct output of model, whose type is Tensor, while for multi-scale testing, it will be converted to numpy array outside of this method.
det_bboxes (Tensor) – shape (n, 4/5)
det_labels (Tensor) – shape (n, )
rcnn_test_cfg (dict) – rcnn testing config
ori_shape (Tuple) – original image height and width, shape (2,)
scale_factor (ndarray | Tensor) – If
rescale is True
, box coordinates are divided by this scale factor to fitori_shape
.rescale (bool) – If True, the resulting masks will be rescaled to
ori_shape
.
- Returns:
- encoded masks. The c-th item in the outer list
corresponds to the c-th class. Given the c-th outer list, the i-th item in that inner list is the mask for the i-th box with class label c.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomRPNHead(in_channels, init_cfg={'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'}, num_convs=1, **kwargs)[source]#
Bases:
RPNHead
RPN head.
- Parameters:
Initialize BaseModule, inherited from torch.nn.Module
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomRetinaHead(*args, bg_loss_weight=-1.0, **kwargs)[source]#
Bases:
RetinaHead
CustomRetinaHead class for OTX.
Initialize BaseModule, inherited from torch.nn.Module
- loss_single(cls_score, bbox_pred, anchors, labels, label_weights, bbox_targets, bbox_weights, num_total_samples)[source]#
Compute loss of a single scale level.
- Parameters:
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor wight shape (N, num_total_anchors, 4).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).
num_total_samples (int) – If sampling, num total samples equal to the number of total anchors; Otherwise, it is the number of positive anchors.
- Returns:
A dictionary of loss components.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomRoIHead(bbox_roi_extractor=None, bbox_head=None, mask_roi_extractor=None, mask_head=None, shared_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]#
Bases:
StandardRoIHead
CustomROIHead class for OTX.
Initialize BaseModule, inherited from torch.nn.Module
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomSSDHead(*args, bg_loss_weight=-1.0, loss_cls=None, loss_balancing=False, **kwargs)[source]#
Bases:
SSDHead
CustomSSDHead class for OTX.
Initialize BaseModule, inherited from torch.nn.Module
- forward(feats)[source]#
Forward features from the upstream network.
- Parameters:
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns:
- cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_anchors * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for all scale
levels, each is a 4D-tensor, the channels number is num_anchors * 4.
- Return type:
- loss(cls_scores, bbox_preds, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#
Loss function.
- loss_single(cls_score, bbox_pred, anchor, labels, label_weights, bbox_targets, bbox_weights, num_total_samples)[source]#
Compute loss of a single image.
- Parameters:
cls_score (Tensor) – Box scores for eachimage Has shape (num_total_anchors, num_classes).
bbox_pred (Tensor) – Box energies / deltas for each image level with shape (num_total_anchors, 4).
anchor (Tensor) – Box reference for each scale level with shape (num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (num_total_anchors,).
label_weights (Tensor) – Label weights of each anchor with shape (num_total_anchors,)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (num_total_anchors, 4).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (num_total_anchors, 4).
num_total_samples (int) – If sampling, num total samples equal to the number of total anchors; Otherwise, it is the number of positive anchors.
- Returns:
A dictionary of loss components.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomVFNetHead(*args, bg_loss_weight=-1.0, **kwargs)[source]#
Bases:
CrossDatasetDetectorHead
,VFNetHead
CustomVFNetHead class for OTX.
Initialize BaseModule, inherited from torch.nn.Module
- get_targets(cls_scores, mlvl_points, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore)[source]#
A wrapper for computing ATSS and FCOS targets for points in multiple images.
- Parameters:
cls_scores (list[Tensor]) – Box iou-aware scores for each scale level with shape (N, num_points * num_classes, H, W).
mlvl_points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | Tensor) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).
- Returns:
labels_list (list[Tensor]): Labels of each level. label_weights (Tensor/None): Label weights of all levels. bbox_targets_list (list[Tensor]): Regression targets of each
level, (l, t, r, b).
bbox_weights (Tensor/None): Bbox weights of all levels.
- Return type:
- loss(cls_scores, bbox_preds, bbox_preds_refine, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#
Compute loss of the head.
- Parameters:
cls_scores (list[Tensor]) – Box iou-aware scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box offsets for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
bbox_preds_refine (list[Tensor]) – Refined Box offsets for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss. Default: None.
- Returns:
A dictionary of loss components.
- Return type:
- class otx.algorithms.detection.adapters.mmdet.models.heads.CustomYOLOXHead(*args, **kwargs)[source]#
Bases:
YOLOXHead
CustomYOLOXHead class for OTX.
Initialize BaseModule, inherited from torch.nn.Module
- forward_single(x, cls_convs, reg_convs, conv_cls, conv_reg, conv_obj)[source]#
Forward feature of a single scale level.
- loss(cls_scores, bbox_preds, objectnesses, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore=None)[source]#
Compute loss of the head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
- class otx.algorithms.detection.adapters.mmdet.models.heads.DETRHeadExtension(init_cfg: dict | None = None)[source]#
Bases:
BaseModule
Head of DETR. DETR:End-to-End Object Detection with Transformers.
Origin implementation: DETRHead of detr_head.py in mmdet3.x What’s changed: Change data type of batch_gt_instances from InstanceList to List[Config].
Since InstanceList is a new data type from mmdet3.x, List[Config] will replace it.
Initialize BaseModule, inherited from torch.nn.Module
- loss_by_feat(all_layers_cls_scores: Tensor, all_layers_bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict], batch_gt_instances_ignore=None) Dict[str, Tensor] [source]#
Loss function.
Only outputs from the last feature level are used for computing losses by default.
- Parameters:
all_layers_cls_scores (Tensor) – Classification outputs of each decoder layers. Each is a 4D-tensor, has shape (num_decoder_layers, bs, num_queries, cls_out_channels).
all_layers_bbox_preds (Tensor) – Sigmoid regression outputs of each decoder layers. Each is a 4D-tensor with normalized coordinate format (cx, cy, w, h) and shape (num_decoder_layers, bs, num_queries, 4).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
- loss_by_feat_single(cls_scores: Tensor, bbox_preds: Tensor, batch_gt_instances: List[Config], batch_img_metas: List[dict]) Tuple[Tensor, Tensor, Tensor] [source]#
Loss function for outputs from a single decoder layer of a single feature level.
- Parameters:
cls_scores (Tensor) – Box score logits from a single decoder layer for all images, has shape (bs, num_queries, cls_out_channels).
bbox_preds (Tensor) – Sigmoid outputs from a single decoder layer for all images, with normalized coordinate (cx, cy, w, h) and shape (bs, num_queries, 4).
batch_gt_instances (List[Config]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
- Returns:
A tuple including loss_cls, loss_box and loss_iou.
- Return type:
Tuple[Tensor]