datumaro.plugins.validators#

Classes

ClassificationValidator([task_type, ...])

A specific validator class for classification task.

DetectionValidator([task_type, ...])

A specific validator class for detection task.

SegmentationValidator([task_type, ...])

A specific validator class for (instance) segmentation task.

class datumaro.plugins.validators.ClassificationValidator(task_type=TaskType.classification, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for classification task.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the classification task.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for classification tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.DetectionValidator(task_type=TaskType.detection, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for detection task.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the detection task.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for detection tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.SegmentationValidator(task_type=TaskType.segmentation, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: DetectionValidator

A specific validator class for (instance) segmentation task.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the segmentation task.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for segmentation tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).