datumaro.plugins.validators#

Classes

`ClassificationValidator`([task_type, ...])	A specific validator class for classification task.
`DetectionValidator`([task_type, ...])	A specific validator class for detection task.
`SegmentationValidator`([task_type, ...])	A specific validator class for (instance) segmentation task.
`TabularValidationStats`(total_ann_count, ...)
`TabularValidator`([task_type, ...])	A specific validator class for tabular dataset.

class datumaro.plugins.validators.ClassificationValidator(task_type=TaskType.classification, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for classification task.

Parameters:

few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the classification task.

Parameters:: dataset (IDataset object) –
Returns:: stats (dict)
Return type:: A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for classification tasks based on its statistics.

Parameters:

dataset (IDataset object) –
stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.DetectionValidator(task_type=TaskType.detection, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for detection task.

Parameters:

few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the detection task.

Parameters:: dataset (IDataset object) –
Returns:: stats (dict)
Return type:: A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for detection tasks based on its statistics.

Parameters:

dataset (IDataset object) –
stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.SegmentationValidator(task_type=TaskType.segmentation, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: DetectionValidator

A specific validator class for (instance) segmentation task.

Parameters:

few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the dataset for the segmentation task.

Parameters:: dataset (IDataset object) –
Returns:: stats (dict)
Return type:: A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for segmentation tasks based on its statistics.

Parameters:

dataset (IDataset object) –
stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).

class datumaro.plugins.validators.TabularValidationStats(total_ann_count: int = 0, items_missing_annotation: List[Any] = <factory>)[source]#

Bases: object

total_ann_count: int = 0#

items_missing_annotation: List[Any]#

classmethod create_with_dataset(dataset)[source]#

to_dict()[source]#

class datumaro.plugins.validators.TabularValidator(task_type=TaskType.tabular, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#

Bases: _TaskValidator

A specific validator class for tabular dataset.

Parameters:

few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed

compute_statistics(dataset)[source]#

Computes statistics of the tabular dataset.

Parameters:: dataset (IDataset object) –
Returns:: stats (dict)
Return type:: A dict object containing statistics of the dataset.

generate_reports(stats)[source]#

Validates the dataset for classification tasks based on its statistics.

Parameters:

dataset (IDataset object) –
stats (Dict object) –

Returns:

reports (list)

Return type:

List of validation reports (DatasetValidationError).