

ClassificationValidator([task_type, ...])

A specific validator class for classification task.

DetectionValidator([task_type, ...])

A specific validator class for detection task.

SegmentationValidator([task_type, ...])

A specific validator class for (instance) segmentation task.

TabularValidationStats(total_ann_count, ...)

TabularValidator([task_type, ...])

A specific validator class for tabular dataset.

class datumaro.plugins.validators.ClassificationValidator(task_type=TaskType.classification, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for classification task.

  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed


Computes statistics of the dataset for the classification task.


dataset (IDataset object) –


stats (dict)

Return type:

A dict object containing statistics of the dataset.


Validates the dataset for classification tasks based on its statistics.

  • dataset (IDataset object) –

  • stats (Dict object) –


reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.DetectionValidator(task_type=TaskType.detection, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for detection task.

  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed


Computes statistics of the dataset for the detection task.


dataset (IDataset object) –


stats (dict)

Return type:

A dict object containing statistics of the dataset.


Validates the dataset for detection tasks based on its statistics.

  • dataset (IDataset object) –

  • stats (Dict object) –


reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.SegmentationValidator(task_type=TaskType.segmentation, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: DetectionValidator

A specific validator class for (instance) segmentation task.

  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed


Computes statistics of the dataset for the segmentation task.


dataset (IDataset object) –


stats (dict)

Return type:

A dict object containing statistics of the dataset.


Validates the dataset for segmentation tasks based on its statistics.

  • dataset (IDataset object) –

  • stats (Dict object) –


reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.TabularValidationStats(total_ann_count: int = 0, items_missing_annotation: List[Any] = <factory>)[source]#

Bases: object

total_ann_count: int = 0#
items_missing_annotation: List[Any]#
classmethod create_with_dataset(dataset)[source]#
class datumaro.plugins.validators.TabularValidator(task_type=TaskType.tabular, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for tabular dataset.

  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed


Computes statistics of the tabular dataset.


dataset (IDataset object) –


stats (dict)

Return type:

A dict object containing statistics of the dataset.


Validates the dataset for classification tasks based on its statistics.

  • dataset (IDataset object) –

  • stats (Dict object) –


reports (list)

Return type:

List of validation reports (DatasetValidationError).