Evaluation#

This module contains the implementation of Accuracy performance provider.

class otx.api.usecases.evaluation.accuracy.Accuracy(resultset: ResultSetEntity, average: MetricAverageMethod = MetricAverageMethod.MICRO)#

This class is responsible for providing Accuracy measures; mainly for Classification problems.

The calculation both supports multi label and binary label predictions.

Accuracy is the proportion of the predicted correct labels, to the total number (predicted and actual) labels for that instance. Overall accuracy is the average across all instances.

Args:: resultset (ResultSetEntity): ResultSet that score will be computed for average (MetricAverageMethod, optional): The averaging method, either MICRO or MACRO

MICRO: compute average over all predictions in all label groups MACRO: compute accuracy per label group, return the average of the per-label-group accuracy scores

property accuracy: ScoreMetric#: Returns the accuracy as ScoreMetric.

get_performance() → Performance#: Returns the performance with accuracy and confusion metrics.

otx.api.usecases.evaluation.accuracy.compute_unnormalized_confusion_matrices_from_resultset(resultset: ResultSetEntity) → List[MatrixMetric]#

Computes an (unnormalized) confusion matrix for every label group in the resultset.

Args:: resultset: the input resultset
Returns:: the computed unnormalized confusion matrices

otx.api.usecases.evaluation.accuracy.precision_metrics_group(confusion_matrix: MatrixMetric) → MetricsGroup#

Computes the precision per class based on a confusion matrix and returns them as ScoreMetrics in a MetricsGroup.

Args:: confusion_matrix: matrix to compute the precision per class for
Returns:: a BarMetricsGroup with the per class precision.

otx.api.usecases.evaluation.accuracy.recall_metrics_group(confusion_matrix: MatrixMetric) → MetricsGroup#

Computes the recall per class based on a confusion matrix and returns them as ScoreMetrics in a MetricsGroup.

Args:: confusion_matrix: matrix to compute the recall per class for
Returns:: a BarMetricsGroup with the per class recall

This module contains the implementations of performance providers for multi-score anomaly metrics.

class otx.api.usecases.evaluation.anomaly_metrics.AnomalyDetectionScores(resultset: ResultSetEntity)#: Performance provider for anomaly detection tasks.

class otx.api.usecases.evaluation.anomaly_metrics.AnomalyLocalizationPerformance(global_score: ScoreMetric, local_score: Optional[ScoreMetric], dashboard_metrics: Optional[List[MetricsGroup]])#

Anomaly specific MultiScorePerformance.

This class implements a special case of the MultiScorePerformance, specific for anomaly tasks that perform anomaly localization (detection/segmentation), in addition to anomaly classification.

Args:

global_score: Image-level performance metric. local_score: Pixel- or bbox-level performance metric, depending

on the task type.

dashboard_metrics: (optional) additional statistics, containing: charts, curves, and other additional info.

property global_score#: Return the global (image-level) score metric.

property local_score#: Return the local (pixel-/bbox-level) score metric.

class otx.api.usecases.evaluation.anomaly_metrics.AnomalyLocalizationScores(resultset: ResultSetEntity)#

AnomalyLocalizationPerformance object for anomaly segmentation and anomaly detection tasks.

Depending on the subclass, the get_performance method returns an AnomalyLocalizationPerformance object with the pixel- or bbox-level metric as the primary score. The global (image-level) performance metric is included as an additional metric.

Args:: resultset: ResultSet that scores will be computed for

get_performance() → Performance#: Return the performance object for the resultset.

class otx.api.usecases.evaluation.anomaly_metrics.AnomalySegmentationScores(resultset: ResultSetEntity)#: Performance provider for anomaly segmentation tasks.

Averaging module contains averaging method enumeration.

class otx.api.usecases.evaluation.averaging.MetricAverageMethod(value)#

This defines the metrics averaging method.

MACRO = 2#

MICRO = 1#

This module contains functions for basic operations.

otx.api.usecases.evaluation.basic_operations.NumberPerLabel#

Dictionary storing a number for each label. The None key represents “all labels”

alias of Dict[Optional[LabelEntity], int]

otx.api.usecases.evaluation.basic_operations.divide_arrays_with_possible_zeros(array1: ndarray, array2: ndarray) → ndarray#

Sometimes the denominator in the precision or recall computation can contain a zero.

In that case, a zero is returned for that element (https://stackoverflow.com/a/32106804).

Args:: array1: the numerator array2: the denominator
Returns:: the divided arrays (numerator/denominator) with a value of zero where the denominator was zero.

otx.api.usecases.evaluation.basic_operations.get_intersections_and_cardinalities(references: List[ndarray], predictions: List[ndarray], labels: List[LabelEntity]) → Tuple[Dict[Optional[LabelEntity], int], Dict[Optional[LabelEntity], int]]#

Returns all intersections and cardinalities between reference masks and prediction masks.

Intersections and cardinalities are each returned in a dictionary mapping each label to its corresponding number of intersection/cardinality pixels

Args:: references (List[np.ndarray]): reference masks,s one mask per image predictions (List[np.ndarray]): prediction masks, one mask per image labels (List[LabelEntity]): labels in input masks
Returns:: Tuple[NumberPerLabel, NumberPerLabel]: (all_intersections, all_cardinalities)

otx.api.usecases.evaluation.basic_operations.intersection_box(box1: Rectangle, box2: Rectangle) → Optional[List[float]]#

Calculate the intersection box of two bounding boxes.

Args:: box1: a Rectangle that represents the first bounding box box2: a Rectangle that represents the second bounding box
Returns:: a Rectangle that represents the intersection box if inputs have a valid intersection, else None

otx.api.usecases.evaluation.basic_operations.intersection_over_union(box1: Rectangle, box2: Rectangle, intersection: Optional[List[float]] = None) → float#

Calculate the Intersection over Union (IoU) of two bounding boxes.

Args:: box1: a Rectangle representing a bounding box box2: a Rectangle representing a second bounding box intersection: precomputed intersection between two boxes (see

intersection_box function), if exists.
Returns:: intersection-over-union of box1 and box2

otx.api.usecases.evaluation.basic_operations.precision_per_class(matrix: ndarray) → ndarray#

Compute the precision per class based on the confusion matrix.

Args:: matrix: the computed confusion matrix
Returns:: the precision (per class), defined as TP/(TP+FP)

otx.api.usecases.evaluation.basic_operations.recall_per_class(matrix: ndarray) → ndarray#

Compute the recall per class based on the confusion matrix.

Args:: matrix: the computed confusion matrix
Returns:: the recall (per class), defined as TP/(TP+FN)

This module contains the Dice performance provider.

class otx.api.usecases.evaluation.dice.DiceAverage(resultset: ResultSetEntity, average: MetricAverageMethod = MetricAverageMethod.MACRO)#

Computes the average Dice coefficient overall and for individual labels.

See https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient for background information.

To compute the Dice coefficient the shapes in the dataset items of the prediction and ground truth dataset are first converted to masks.

Dice is computed by computing the intersection and union computed over the whole dataset, instead of computing intersection and union for individual images and then averaging.

Args:

resultset (ResultSetEntity): ResultSet that score will be computed for average (MetricAverageMethod): One of

MICRO: every pixel has the same weight, regardless of label

MACRO: compute score per label, return the average of the per-label scores

classmethod compute_dice_using_intersection_and_cardinality(all_intersection: Dict[Optional[LabelEntity], int], all_cardinality: Dict[Optional[LabelEntity], int], average: MetricAverageMethod) → Tuple[ScoreMetric, Dict[LabelEntity, ScoreMetric]]#

Computes dice score using intersection and cardinality dictionaries.

Both dictionaries must contain the same set of keys. Dice score is computed by: 2 * intersection / cardinality

Args:

average: Averaging method to use all_intersection: collection of intersections per label all_cardinality: collection of cardinality per label

Returns:

A tuple containing the overall DICE score, and per label DICE score

Raises:

KeyError: if the keys in intersection and cardinality do not: match
KeyError: if the key None is not present in either: all_intersection or all_cardinality
ValueError: if the intersection for a certain key is larger: than its corresponding cardinality

property dice_per_label: Dict[LabelEntity, ScoreMetric]#: Returns a dictionary mapping the label to its corresponding dice score (as ScoreMetric).

get_performance() → Performance#: Returns the performance of the resultset.

property overall_dice: ScoreMetric#: Returns the dice average as ScoreMetric.

This module contains the f-measure performance provider class.

class otx.api.usecases.evaluation.f_measure.FMeasure(resultset: ResultSetEntity, vary_confidence_threshold: bool = False, vary_nms_threshold: bool = False, cross_class_nms: bool = False)#

Computes the f-measure (also known as F1-score) for a resultset.

The f-measure is typically used in detection (localization) tasks to obtain a single number that balances precision and recall.

To determine whether a predicted box matches a ground truth box an overlap measured is used based on a minimum intersection-over-union (IoU), by default a value of 0.5 is used.

In addition spurious results are eliminated by applying non-max suppression (NMS) so that two predicted boxes with IoU > threshold are reduced to one. This threshold can be determined automatically by setting vary_nms_threshold to True.

Args:

resultset (ResultSetEntity) :ResultSet entity used for calculating the F-Measure vary_confidence_threshold (bool): if True the maximal F-measure is determined by optimizing for different

confidence threshold values Defaults to False.

vary_nms_threshold (bool): if True the maximal F-measure is determined by optimizing for different NMS threshold: values. Defaults to False.
cross_class_nms (bool): Whether non-max suppression should be applied cross-class. If True this will eliminate: boxes with sufficient overlap even if they are from different classes. Defaults to False.

Raises:

ValueError: if prediction dataset and ground truth dataset are empty

property best_confidence_threshold: Optional[ScoreMetric]#: Returns best confidence threshold as ScoreMetric if exists.

property best_nms_threshold: Optional[ScoreMetric]#: Returns the best NMS threshold as ScoreMetric if exists.

box_class_index = 4#

box_score_index = 5#

property f_measure: ScoreMetric#: Returns the f-measure as ScoreMetric.

property f_measure_per_confidence: Optional[CurveMetric]#: Returns the curve for f-measure per confidence as CurveMetric if exists.

property f_measure_per_label: Dict[LabelEntity, ScoreMetric]#: Returns the f-measure per label as dictionary (Label -> ScoreMetric).

property f_measure_per_nms: Optional[CurveMetric]#: Returns the curve for f-measure per nms threshold as CurveMetric if exists.

get_performance() → Performance#

Returns the performance which consists of the F-Measure score and the dashboard metrics.

Returns:: Performance: Performance object containing the F-Measure score and the dashboard metrics.

otx.api.usecases.evaluation.f_measure.bounding_box_intersection_over_union(box1: Tuple[float, float, float, float, str, float], box2: Tuple[float, float, float, float, str, float]) → float#

Calculate the Intersection over Union (IoU) of two bounding boxes.

Args:: box1 (Tuple[float, float, float, float, str, float]): (x1, y1, x2, y2, class, score) box2 (Tuple[float, float, float, float, str, float]): (x1, y1, x2, y2, class, score)
Raises:: ValueError: In case the IoU is outside of [0.0, 1.0]
Returns:: float: Intersection-over-union of box1 and box2.

otx.api.usecases.evaluation.f_measure.get_iou_matrix(ground_truth: List[Tuple[float, float, float, float, str, float]], predicted: List[Tuple[float, float, float, float, str, float]]) → ndarray#

Constructs an iou matrix of shape [num_ground_truth_boxes, num_predicted_boxes].

Each cell(x,y) in the iou matrix contains the intersection over union of ground truth box(x) and predicted box(y) An iou matrix corresponds to a single image

Args:

ground_truth (List[Tuple[float, float, float, float, str, float]]): List of ground truth boxes.: Each box is a list of (x,y) coordinates and a label. a box: [x1: float, y1, x2, y2, class: str, score: float] boxes_per_image: [box1, box2, …] boxes1: [boxes_per_image_1, boxes_per_image_2, boxes_per_image_3, …]
predicted (List[Tuple[float, float, float, float, str, float]]): List of predicted boxes.: Each box is a list of (x,y) coordinates and a label. a box: [x1: float, y1, x2, y2, class: str, score: float] boxes_per_image: [box1, box2, …] boxes2: [boxes_per_image_1, boxes_per_image_2, boxes_per_image_3, …]

Returns:

np.ndarray: IoU matrix of shape [ground_truth_boxes, predicted_boxes]

otx.api.usecases.evaluation.f_measure.get_n_false_negatives(iou_matrix: ndarray, iou_threshold: float) → int#

Get the number of false negatives inside the IoU matrix for a given threshold.

The first loop accounts for all the ground truth boxes which do not have a high enough iou with any predicted box (they go undetected) The second loop accounts for the much rarer case where two ground truth boxes are detected by the same predicted box. The principle is that each ground truth box requires a unique prediction box

Args:: iou_matrix (np.ndarray): IoU matrix of shape [ground_truth_boxes, predicted_boxes] iou_threshold (float): IoU threshold to use for the false negatives.
Returns:: int: Number of false negatives

otx.api.usecases.evaluation.f_measure.intersection_box(box1: Tuple[float, float, float, float, str, float], box2: Tuple[float, float, float, float, str, float]) → Tuple[float, float, float, float]#

Calculate the intersection box of two bounding boxes.

Args:: box1 (Tuple[float, float, float, float, str, float]): (x1, y1, x2, y2, class, score) box2 (Tuple[float, float, float, float, str, float]): (x1, y1, x2, y2, class, score)
Returns:: Tuple[float, float, float, float]: (x_left, x_right, y_bottom, y_top)

Helper functions for computing metrics.

This module contains the helper functions which can be called directly by algorithm implementers to obtain the metrics.

class otx.api.usecases.evaluation.metrics_helper.MetricsHelper#

Contains metrics computation functions.

TODO: subject for refactoring.

static compute_accuracy(resultset: ResultSetEntity, average: MetricAverageMethod = MetricAverageMethod.MICRO) → Accuracy#

Compute the Accuracy on a resultset, averaged over the different label groups.

Args:: resultset: The resultset used to compute the accuracy average: The averaging method, either MICRO or MACRO
Returns:: Accuracy object

static compute_anomaly_detection_scores(resultset: ResultSetEntity) → AnomalyDetectionScores#

Compute the anomaly localization performance metrics on an anomaly detection resultset.

Args:: resultset: The resultset used to compute the metrics
Returns:: AnomalyLocalizationScores object

static compute_anomaly_segmentation_scores(resultset: ResultSetEntity) → AnomalySegmentationScores#

Compute the anomaly localization performance metrics on an anomaly segmentation resultset.

Args:: resultset: The resultset used to compute the metrics
Returns:: AnomalyLocalizationScores object

static compute_dice_averaged_over_pixels(resultset: ResultSetEntity, average: MetricAverageMethod = MetricAverageMethod.MACRO) → DiceAverage#

Compute the Dice average on a resultset, averaged over the pixels.

Args:: resultset: The resultset used to compute the Dice average average: The averaging method, either MICRO or MACRO
Returns:: DiceAverage object

static compute_f_measure(resultset: ResultSetEntity, vary_confidence_threshold: bool = False, vary_nms_threshold: bool = False, cross_class_nms: bool = False) → FMeasure#

Compute the F-Measure on a resultset given some parameters.

Args:

resultset: The resultset used to compute f-measure vary_confidence_threshold: Flag specifying whether f-measure

shall be computed for different confidence threshold values

vary_nms_threshold: Flag specifying whether f-measure shall: be computed for different NMS threshold values
cross_class_nms: Whether non-max suppression should be: applied cross-class

Returns:

FMeasure object

This module contains interface for performance providers.

class otx.api.usecases.evaluation.performance_provider_interface.IPerformanceProvider#

Interface for performance provider.

TODO: subject for refactoring.

abstract get_performance() → Performance#: Returns the computed performance.