datumaro.plugins.validators#

Classes

ClassificationValidator([task_type, ...])

A specific validator class for classification task.

DetectionValidator([task_type, ...])

A specific validator class for detection task.

SegmentationValidator([task_type, ...])

A specific validator class for (instance) segmentation task.

TabularValidationStats(total_ann_count, ...)

TabularValidator([task_type, ...])

A specific validator class for tabular dataset.

class datumaro.plugins.validators.ClassificationValidator(task_type=TaskType.classification, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for classification task.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the classification task.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for classification tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.DetectionValidator(task_type=TaskType.detection, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for detection task.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the detection task.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for detection tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.SegmentationValidator(task_type=TaskType.segmentation, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: DetectionValidator

A specific validator class for (instance) segmentation task.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the segmentation task.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for segmentation tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.TabularValidationStats(total_ann_count: int = 0, items_missing_annotation: List[Any] = <factory>)[source]#

Bases: object

total_ann_count: int = 0#
items_missing_annotation: List[Any]#
classmethod create_with_dataset(dataset)[source]#
to_dict()[source]#
class datumaro.plugins.validators.TabularValidator(task_type=TaskType.tabular, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for tabular dataset.

Parameters:
  • few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold

  • imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed

  • far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values

  • dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold

  • topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the tabular dataset.

Parameters:

dataset (IDataset object) –

Returns:

stats (dict)

Return type:

A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for classification tasks based on its statistics.

Parameters:
  • dataset (IDataset object) –

  • stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).