datumaro.plugins.validators#
Classes
|
A specific validator class for classification task. |
|
A specific validator class for detection task. |
|
A specific validator class for (instance) segmentation task. |
|
|
|
A specific validator class for tabular dataset. |
- class datumaro.plugins.validators.ClassificationValidator(task_type=TaskType.classification, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#
Bases:
_TaskValidator
A specific validator class for classification task.
- Parameters:
few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed
- compute_statistics(dataset)[source]#
Computes statistics of the dataset for the classification task.
- Parameters:
dataset (IDataset object) –
- Returns:
stats (dict)
- Return type:
A dict object containing statistics of the dataset.
- generate_reports(stats)[source]#
Validates the dataset for classification tasks based on its statistics.
- Parameters:
dataset (IDataset object) –
stats (Dict object) –
- Returns:
reports (list)
- Return type:
List of validation reports (DatasetValidationError).
- class datumaro.plugins.validators.DetectionValidator(task_type=TaskType.detection, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#
Bases:
_TaskValidator
A specific validator class for detection task.
- Parameters:
few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed
- compute_statistics(dataset)[source]#
Computes statistics of the dataset for the detection task.
- Parameters:
dataset (IDataset object) –
- Returns:
stats (dict)
- Return type:
A dict object containing statistics of the dataset.
- generate_reports(stats)[source]#
Validates the dataset for detection tasks based on its statistics.
- Parameters:
dataset (IDataset object) –
stats (Dict object) –
- Returns:
reports (list)
- Return type:
List of validation reports (DatasetValidationError).
- class datumaro.plugins.validators.SegmentationValidator(task_type=TaskType.segmentation, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#
Bases:
DetectionValidator
A specific validator class for (instance) segmentation task.
- Parameters:
few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed
- compute_statistics(dataset)[source]#
Computes statistics of the dataset for the segmentation task.
- Parameters:
dataset (IDataset object) –
- Returns:
stats (dict)
- Return type:
A dict object containing statistics of the dataset.
- generate_reports(stats)[source]#
Validates the dataset for segmentation tasks based on its statistics.
- Parameters:
dataset (IDataset object) –
stats (Dict object) –
- Returns:
reports (list)
- Return type:
List of validation reports (DatasetValidationError).
- class datumaro.plugins.validators.TabularValidationStats(total_ann_count: int = 0, items_missing_annotation: List[Any] = <factory>)[source]#
Bases:
object
- class datumaro.plugins.validators.TabularValidator(task_type=TaskType.tabular, few_samples_thr=None, imbalance_ratio_thr=None, far_from_mean_thr=None, dominance_ratio_thr=None, topk_bins=None)[source]#
Bases:
_TaskValidator
A specific validator class for tabular dataset.
- Parameters:
few_samples_thr (int) – minimum number of samples per class warn user when samples per class is less than threshold
imbalance_ratio_thr (int) – ratio of majority attribute to minority attribute warn user when annotations are unevenly distributed
far_from_mean_thr (float) – constant used to define mean +/- m * stddev warn user when there are too big or small values
dominance_ratio_thr (float) – ratio of Top-k bin to total warn user when dominance ratio is over threshold
topk_bins (float) – ratio of selected bins with most item number to total bins warn user when values are not evenly distributed
- compute_statistics(dataset)[source]#
Computes statistics of the tabular dataset.
- Parameters:
dataset (IDataset object) –
- Returns:
stats (dict)
- Return type:
A dict object containing statistics of the dataset.
- generate_reports(stats)[source]#
Validates the dataset for classification tasks based on its statistics.
- Parameters:
dataset (IDataset object) –
stats (Dict object) –
- Returns:
reports (list)
- Return type:
List of validation reports (DatasetValidationError).