otx.algo.detection.losses#

Custom OTX Losses for Object Detection.

Classes

ATSSCriterion(num_classes, bbox_coder, ...)

ATSSCriterion is a loss criterion used in the Adaptive Training Sample Selection (ATSS) algorithm.

DetrCriterion(weight_dict[, alpha, gamma, ...])

This class computes the loss for DETR.

RTMDetCriterion(num_classes, loss_cls, loss_bbox)

RTMDetCriterion is a criterion module for RTM-based object detection.

SSDCriterion(num_classes[, bbox_coder, ...])

SSDCriterion is a loss criterion for Single Shot MultiBox Detector (SSD).

YOLOv9Criterion(num_classes, vec2box[, ...])

YOLOv9 criterion module.

YOLOXCriterion(num_classes[, loss_cls, ...])

YOLOX criterion module.

DFINECriterion(weight_dict[, alpha, gamma, ...])

D-Fine criterion with FGL and DDF losses.

class otx.algo.detection.losses.ATSSCriterion(num_classes: int, bbox_coder: Module, loss_cls: Module, loss_bbox: Module, loss_centerness: Module | None = None, use_qfl: bool = False, qfl_cfg: dict | None = None, reg_decoded_bbox: bool = True, bg_loss_weight: float = -1.0)[source]#

Bases: Module

ATSSCriterion is a loss criterion used in the Adaptive Training Sample Selection (ATSS) algorithm.

Parameters:
  • num_classes (int) – The number of object classes.

  • bbox_coder (nn.Module) – The module used for encoding and decoding bounding box coordinates.

  • loss_cls (nn.Module) – The module used for calculating the classification loss.

  • loss_bbox (nn.Module) – The module used for calculating the bounding box regression loss.

  • loss_centerness (nn.Module | None, optional) – The module used for calculating the centerness loss. Defaults to None.

  • use_qfl (bool, optional) – Whether to use the Quality Focal Loss (QFL). Defaults to CrossEntropyLoss(use_sigmoid=True, loss_weight=1.0).

  • reg_decoded_bbox (bool, optional) – Whether to use the decoded bounding box coordinates for regression loss calculation. Defaults to True.

  • bg_loss_weight (float, optional) – The weight for the background loss. Defaults to -1.0.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

centerness_target(anchors: Tensor, gts: Tensor) Tensor[source]#

Calculate the centerness between anchors and gts.

Only calculate pos centerness targets, otherwise there may be nan.

Parameters:
  • anchors (Tensor) – Anchors with shape (N, 4), “xyxy” format.

  • gts (Tensor) – Ground truth bboxes with shape (N, 4), “xyxy” format.

Returns:

Centerness between anchors and gts.

Return type:

Tensor

forward(anchors: Tensor, cls_score: Tensor, bbox_pred: Tensor, centerness: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, valid_label_mask: Tensor, avg_factor: float) dict[str, Tensor][source]#

Compute loss of a single scale level.

Parameters:
  • anchors (Tensor) – Box reference for scale levels with shape (N, num_total_anchors, 4).

  • cls_score (Tensor) – Box scores for scale levels have shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for scale levels with shape (N, num_anchors * 4, H, W).

  • centerness (Tensor) – Centerness scores for each scale level.

  • labels (Tensor) – Labels of anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of anchors with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of anchors with shape (N, num_total_anchors, 4).

  • valid_label_mask (Tensor) – Label mask for consideration of ignored label with shape (N, num_total_anchors, 1).

  • avg_factor (float) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algo.detection.losses.DFINECriterion(weight_dict: dict[str, int | float], alpha: float = 0.2, gamma: float = 2.0, num_classes: int = 80, reg_max: int = 32)[source]#

Bases: Module

D-Fine criterion with FGL and DDF losses.

TODO(Eugene): Consider merge with RTDETRCriterion in the next PR.

The process happens in two steps: 1) we compute hungarian assignment between ground truth boxes and the outputs of the model 2) we supervise each pair of matched ground-truth / prediction (supervise class and box)

Parameters:
  • weight_dict (dict[str, int | float]) – A dictionary containing the weights for different loss components.

  • alpha (float, optional) – The alpha parameter for the loss calculation. Defaults to 0.2.

  • gamma (float, optional) – The gamma parameter for the loss calculation. Defaults to 2.0.

  • num_classes (int, optional) – The number of classes. Defaults to 80.

  • reg_max (int, optional) – The maximum number of bin targets. Defaults to 32.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

static fgl_loss(preds: Tensor, targets: Tensor, weight_right: Tensor, weight_left: Tensor, iou_weight: Tensor | None = None, reduction: str = 'sum', avg_factor: float | None = None) Tensor[source]#

Fine-Grained Localization (FGL) Loss.

Parameters:
  • preds (Tensor) – predicted distances

  • targets (Tensor) – target distances

  • weight_right (Tensor) – weight for right distance

  • weight_left (Tensor) – weight for left distance

  • iou_weight (Tensor, optional) – IoU weight. Defaults to None.

  • reduction (str, optional) – reduction method. Defaults to “sum”.

  • avg_factor (float, optional) – average factor. Defaults to None.

Returns:

FGL loss

Return type:

Tensor

forward(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]]) dict[str, Tensor][source]#

This performs the loss computation.

Parameters:
  • outputs (dict[str, torch.Tensor]) – dict of tensors, see the output specification of the model for the format

  • targets (list[dict[str, torch.Tensor]]) – list of dicts, such that len(targets) == batch_size. The expected keys in each dict depends on the losses applied, see each loss’ doc

Returns:

dict of losses

Return type:

dict[str, torch.Tensor]

static get_cdn_matched_indices(dn_meta: dict[str, list[Tensor]], targets: list[dict[str, Tensor]]) list[tuple[Tensor, Tensor]][source]#

get_cdn_matched_indices.

Parameters:
  • dn_meta (dict[str, list[torch.Tensor]]) – meta data for cdn

  • targets (list[dict[str, torch.Tensor]]) – targets

loss_boxes(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]], indices: list[tuple[int, int]], num_boxes: int) dict[str, Tensor][source]#

Compute the losses re)L1 regression loss and the GIoU loss.

Targets dicts must contain the key “boxes” containing a tensor of dim [nb_target_boxes, 4] The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.

Parameters:
  • outputs (dict[str, Tensor]) – The outputs of the model.

  • targets (list[dict[str, Tensor]]) – The targets.

  • indices (list[tuple[int, int]]) – The indices of the matched boxes.

  • num_boxes (int) – The number of boxes.

Returns:

The losses.

Return type:

dict[str, Tensor]

loss_labels_vfl(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]], indices: list[tuple[int, int]], num_boxes: int) dict[str, Tensor][source]#

Varifocal Loss (VFL) for label prediction.

Parameters:
  • outputs (dict[str, Tensor]) – Model outputs.

  • targets (List[Dict[str, Tensor]]) – List of target dictionaries.

  • indices (List[Tuple[int, int]]) – List of tuples of indices.

  • num_boxes (int) – Number of predicted boxes.

Returns:

The loss dictionary.

Return type:

dict[str, Tensor]

loss_local(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]], indices: list[tuple[int, int]], num_boxes: int, temperature: int = 5) dict[str, Tensor][source]#

Compute Fine-Grained Localization (FGL) Loss and Decoupled Distillation Focal (DDF) Loss.

Parameters:
  • outputs (dict[str, Tensor]) – The outputs of the model.

  • targets (list[dict[str, Tensor]]) – The targets.

  • indices (list[tuple[int, int]]) – The indices of the matched boxes.

  • num_boxes (int) – The number of boxes.

  • temperature (int, optional) – Temperature for distillation. Defaults to 5.

Returns:

FGL and DDF losses.

Return type:

dict[str, Tensor]

class otx.algo.detection.losses.DetrCriterion(weight_dict: dict[str, int | float], alpha: float = 0.2, gamma: float = 2.0, num_classes: int = 80)[source]#

Bases: Module

This class computes the loss for DETR.

The process happens in two steps:
  1. we compute hungarian assignment between ground truth boxes and the outputs of the model

  2. we supervise each pair of matched ground-truth / prediction (supervise class and box)

Parameters:
  • weight_dict (dict[str, int | float]) – A dictionary containing the weights for different loss components.

  • alpha (float, optional) – The alpha parameter for the loss calculation. Defaults to 0.2.

  • gamma (float, optional) – The gamma parameter for the loss calculation. Defaults to 2.0.

  • num_classes (int, optional) – The number of classes. Defaults to 80.

Create the criterion.

forward(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]]) dict[str, Tensor][source]#

This performs the loss computation.

Parameters:
  • outputs (dict[str, torch.Tensor]) – dict of tensors, see the output specification of the model for the format

  • targets (list[dict[str, torch.Tensor]]) – list of dicts, such that len(targets) == batch_size. The expected keys in each dict depends on the losses applied, see each loss’ doc

static get_cdn_matched_indices(dn_meta: dict[str, list[Tensor]], targets: list[dict[str, Tensor]]) list[tuple[Tensor, Tensor]][source]#

get_cdn_matched_indices.

Parameters:
  • dn_meta (dict[str, list[torch.Tensor]]) – meta data for cdn

  • targets (list[dict[str, torch.Tensor]]) – targets

loss_boxes(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]], indices: list[tuple[int, int]], num_boxes: int) dict[str, Tensor][source]#

Compute the losses re)L1 regression loss and the GIoU loss.

Targets dicts must contain the key “boxes” containing a tensor of dim [nb_target_boxes, 4] The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.

Parameters:
  • outputs (dict[str, torch.Tensor]) – The outputs of the model.

  • targets (list[dict[str, torch.Tensor]]) – The targets.

  • indices (list[tuple[int, int]]) – The indices of the matched boxes.

  • num_boxes (int) – The number of boxes.

Returns:

The losses.

Return type:

dict[str, torch.Tensor]

loss_labels_vfl(outputs: dict[str, Tensor], targets: list[dict[str, Tensor]], indices: list[tuple[int, int]], num_boxes: int) dict[str, Tensor][source]#

Compute the vfl loss.

Parameters:
  • outputs (dict[str, torch.Tensor]) – Model outputs.

  • targets (List[Dict[str, torch.Tensor]]) – List of target dictionaries.

  • indices (List[Tuple[int, int]]) – List of tuples of indices.

  • num_boxes (int) – Number of predicted boxes.

class otx.algo.detection.losses.RTMDetCriterion(num_classes: int, loss_cls: Module, loss_bbox: Module)[source]#

Bases: Module

RTMDetCriterion is a criterion module for RTM-based object detection.

Parameters:
  • num_classes (int) – Number of object classes.

  • loss_cls (nn.Module) – Classification loss module.

  • loss_bbox (nn.Module) – Bounding box regression loss module.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(cls_score: Tensor, bbox_pred: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, assign_metrics: Tensor, stride: list[int], **kwargs) dict[str, Tensor][source]#

Compute loss of a single scale level.

Parameters:
  • cls_score (Tensor) – Box scores for scale levels have shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Decoded bboxes for scale levels with shape (N, num_anchors * 4, H, W).

  • labels (Tensor) – Labels of anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of anchors with shape (N, num_total_anchors).

  • bbox_targets (Tensor) – BBox regression targets of anchors with shape (N, num_total_anchors, 4).

  • assign_metrics (Tensor) – Assign metrics with shape (N, num_total_anchors).

  • stride (list[int]) – Downsample stride of the feature map.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class otx.algo.detection.losses.SSDCriterion(num_classes: int, bbox_coder: Module | None = None, neg_pos_ratio: int = 3, reg_decoded_bbox: bool = False, smoothl1_beta: float = 1.0)[source]#

Bases: Module

SSDCriterion is a loss criterion for Single Shot MultiBox Detector (SSD).

Parameters:
  • num_classes (int) – Number of classes including the background class.

  • bbox_coder (nn.Module) – Bounding box coder module. Defaults to None.

  • neg_pos_ratio (int, optional) – Ratio of negative to positive samples. Defaults to 3.

  • reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Defaults to False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.

  • smoothl1_beta (float, optional) – Beta parameter for the smooth L1 loss. Defaults to 1.0.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(cls_score: Tensor, bbox_pred: Tensor, anchor: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor, avg_factor: int) dict[str, Tensor][source]#

Compute losses of images.

Parameters:
  • cls_score (Tensor) – Box scores for images have shape (N, num_total_anchors, num_classes).

  • bbox_pred (Tensor) – Box energies / deltas for image levels with shape (N, num_total_anchors, 4).

  • anchors (Tensor) – Box reference for for scale levels with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of anchors with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of anchors with shape (N, num_total_anchors, 4).

  • bbox_weights (Tensor) – BBox regression loss weights of anchors with shape (N, num_total_anchors, 4).

  • avg_factor (int) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

Returns:

A dictionary of loss components. the dict has components below:

  • loss_cls (list[Tensor]): A list containing each feature map classification loss.

  • loss_bbox (list[Tensor]): A list containing each feature map regression loss.

Return type:

dict[str, Tensor]

class otx.algo.detection.losses.YOLOXCriterion(num_classes: int, loss_cls: Module | None = None, loss_bbox: Module | None = None, loss_obj: Module | None = None, loss_l1: Module | None = None, use_l1: bool = False)[source]#

Bases: Module

YOLOX criterion module.

This module calculates the loss for YOLOX object detection model.

Parameters:
  • num_classes (int) – The number of classes.

  • loss_cls (nn.Module | None) – The classification loss module. Defaults to None.

  • loss_bbox (nn.Module | None) – The bounding box regression loss module. Defaults to None.

  • loss_obj (nn.Module | None) – The objectness loss module. Defaults to None.

  • loss_l1 (nn.Module | None) – The L1 loss module. Defaults to None.

Returns:

A dictionary containing the calculated losses.

Return type:

dict[str, Tensor]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(flatten_objectness: Tensor, flatten_cls_preds: Tensor, flatten_bbox_preds: Tensor, flatten_bboxes: Tensor, obj_targets: Tensor, cls_targets: Tensor, bbox_targets: Tensor, l1_targets: Tensor, num_total_samples: Tensor, num_pos: Tensor, pos_masks: Tensor) dict[str, Tensor][source]#

Forward pass of the YOLOX criterion module.

Parameters:
  • flatten_objectness (Tensor) – Flattened objectness predictions.

  • flatten_cls_preds (Tensor) – Flattened class predictions.

  • flatten_bbox_preds (Tensor) – Flattened bounding box predictions.

  • flatten_bboxes (Tensor) – Flattened ground truth bounding boxes.

  • obj_targets (Tensor) – Objectness targets.

  • cls_targets (Tensor) – Class targets.

  • bbox_targets (Tensor) – Bounding box targets.

  • l1_targets (Tensor) – L1 targets.

  • num_total_samples (Tensor) – Total number of samples.

  • num_pos (Tensor) – Number of positive samples.

  • pos_masks (Tensor) – Positive masks.

Returns:

A dictionary containing the calculated losses.

Return type:

dict[str, Tensor]

class otx.algo.detection.losses.YOLOv9Criterion(num_classes: int, vec2box: Vec2Box, loss_cls: Module | None = None, loss_dfl: Module | None = None, loss_iou: Module | None = None, reg_max: int = 16, cls_rate: float = 0.5, dfl_rate: float = 1.5, iou_rate: float = 7.5, aux_rate: float = 0.25)[source]#

Bases: Module

YOLOv9 criterion module.

This module calculates the loss for YOLOv9 object detection model.

Parameters:
  • num_classes (int) – The number of classes.

  • vec2box (Vec2Box) – The Vec2Box object.

  • loss_cls (nn.Module | None) – The classification loss module. Defaults to None.

  • loss_dfl (nn.Module | None) – The DFLoss loss module. Defaults to None.

  • loss_iou (nn.Module | None) – The IoULoss loss module. Defaults to None.

  • reg_max (int, optional) – Maximum number of anchor regions. Defaults to 16.

  • cls_rate (float, optional) – The classification loss rate. Defaults to 1.5.

  • dfl_rate (float, optional) – The DFLoss loss rate. Defaults to 7.5.

  • iou_rate (float, optional) – The IoU loss rate. Defaults to 0.5.

  • aux_rate (float, optional) – The auxiliary loss rate. Defaults to 0.25.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(main_preds: tuple[Tensor, Tensor, Tensor], targets: Tensor, aux_preds: tuple[Tensor, Tensor, Tensor] | None = None) dict[str, Tensor] | None[source]#

Forward pass of the YOLOv9 criterion module.

Parameters:
  • main_preds (tuple[Tensor, Tensor, Tensor]) – The main predictions.

  • targets (Tensor) – The learning target of the prediction.

  • aux_preds (tuple[Tensor, Tensor, Tensor], optional) – The auxiliary predictions. Defaults to None.

Returns:

The loss dictionary.

Return type:

dict[str, Tensor]

separate_anchor(anchors: Tensor) tuple[Tensor, Tensor][source]#

Separate anchor and bounding box.

Parameters:

anchors (Tensor) – The anchor tensor.