datumaro.plugins.transforms#
Classes
|
Collects all labels from annotations (of all types) and transforms them into a set of annotations of type Label |
|
Enables the conversion of annotation types for the categories and individual items within a dataset. |
|
Subtracts one from the coordinates of bounding boxes |
|
|
|
|
|
A class used to refine the media items in a dataset. |
|
Correct the dataset from a validation report. |
|
Sorts polygons and masks ("segments") according to z_order, crops covered areas of underlying segments. |
|
Renames items in the dataset using image file name (without extension). |
|
Renames subsets in the dataset. |
|
|
|
Replaces instance masks and, optionally, polygons with a single mask. |
|
|
|
Changes the content of infos. |
|
Changes the order of labels in the dataset from the existing to the desired one, removes unknown labels and adds new labels. |
|
Joins all subsets into one and splits the result into few parts. |
|
Replaces dataset item IDs with sequential indices. |
|
Replaces dataset items' annotations with sequential indices. |
|
Changes labels in the dataset. |
|
Allows to remove annotations on specific dataset items. |
|
Allows to remove item and annotation attributes in a dataset. |
|
Allows to remove specific dataset items from dataset by their ids. |
|
Renames items in the dataset. |
|
Resizes images and annotations in the dataset to the specified size. |
|
|
|
Sorts dataset items. |
- class datumaro.plugins.transforms.CropCoveredSegments(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
Sorts polygons and masks (“segments”) according to z_order, crops covered areas of underlying segments. If a segment is split into several independent parts by the segments above, produces the corresponding number of separate annotations joined into a group.
- class datumaro.plugins.transforms.MergeInstanceSegments(extractor, include_polygons=False)[source]#
Bases:
ItemTransform
,CliPlugin
Replaces instance masks and, optionally, polygons with a single mask. A group of annotations with the same group id is considered an “instance”. The largest annotation in the group is considered the group “head”, so the resulting mask takes properties from that annotation.
- class datumaro.plugins.transforms.PolygonsToMasks(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
- class datumaro.plugins.transforms.BoxesToMasks(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
- class datumaro.plugins.transforms.BoxesToPolygons(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
- class datumaro.plugins.transforms.MasksToPolygons(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
- class datumaro.plugins.transforms.ShapesToBoxes(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
- class datumaro.plugins.transforms.Reindex(extractor, start: int = 1)[source]#
-
Replaces dataset item IDs with sequential indices.
- class datumaro.plugins.transforms.ReindexAnnotations(extractor, start: int = 1, reindex_each_item: bool = False)[source]#
Bases:
ItemTransform
,CliPlugin
Replaces dataset items’ annotations with sequential indices.
- transform_item(item: DatasetItem) DatasetItem [source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.
- class datumaro.plugins.transforms.MapSubsets(extractor, mapping=None)[source]#
Bases:
ItemTransform
,CliPlugin
Renames subsets in the dataset.
- class datumaro.plugins.transforms.RandomSplit(extractor, splits, seed=None)[source]#
-
Joins all subsets into one and splits the result into few parts. It is expected that item ids are unique and subset ratios sum up to 1.
Example:
random_split --subset train:.67 --subset test:.33
- class datumaro.plugins.transforms.IdFromImageName(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
Renames items in the dataset using image file name (without extension).
- class datumaro.plugins.transforms.Rename(extractor, regex)[source]#
Bases:
ItemTransform
,CliPlugin
Renames items in the dataset. Supports regular expressions. The first character in the expression is a delimiter for the pattern and replacement parts. Replacement part can also contain str.format replacement fields with the item (of type DatasetItem) object available. Please use doulbe quotes to represent regex.
- Examples:
Replace ‘pattern’ with ‘replacement’:
rename -e "|pattern|replacement|"
Remove ‘frame_’ from item ids:
rename -e "|^frame_||"
Rename by regex:
rename -e "|frame_(\d+)_extra|{item.subset}_id_\1|"
- class datumaro.plugins.transforms.RemapLabels(extractor: IDataset, mapping: Dict[str, str] | List[Tuple[str, str]], default: None | str | DefaultAction = None)[source]#
Bases:
ItemTransform
,CliPlugin
Changes labels in the dataset.
- A label can be:
renamed (and joined with existing) - when ‘–label <old_name>:<new_name>’ is specified
deleted - when ‘–label <name>:’ is specified, or default action is ‘delete’ and the label is not mentioned in the list. When a label is deleted, all the associated annotations are removed
kept unchanged - when specified ‘–label <name>:<name>’ or default action is ‘keep’ and the label is not mentioned in the list.
Annotations with no label are managed by the default action policy.
Examples:
Remove the ‘person’ label (and corresponding annotations):
remap_labels -l person: --default keep
Rename ‘person’ to ‘pedestrian’ and ‘human’ to ‘pedestrian’, join:
remap_labels -l person:pedestrian -l human:pedestrian --default keep
Rename ‘person’ to ‘car’ and ‘cat’ to ‘dog’, keep ‘bus’, remove others:
remap_labels -l person:car -l bus:bus -l cat:dog --default delete
- class datumaro.plugins.transforms.ProjectInfos(extractor: IDataset, dst_infos: Dict[str, Any], overwrite: bool = False)[source]#
-
Changes the content of infos. A user can add meta-data of dataset such as author, comments, or related papers. Infos values are not affect on the dataset structure. We thus can add any meta-data freely.
- class datumaro.plugins.transforms.ProjectLabels(extractor: IDataset, dst_labels: Iterable[str] | LabelCategories)[source]#
Bases:
ItemTransform
Changes the order of labels in the dataset from the existing to the desired one, removes unknown labels and adds new labels. Updates or removes the corresponding annotations.
Labels are matched by names (case dependent). Parent labels are only kept if they are present in the resulting set of labels. If new labels are added, and the dataset has mask colors defined, new labels will obtain generated colors.
Useful for merging similar datasets, whose labels need to be aligned.
- Examples:
Align the source dataset labels to [person, cat, dog]:
project_labels -l person -l cat -l dog
- class datumaro.plugins.transforms.AnnsToLabels(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
Collects all labels from annotations (of all types) and transforms them into a set of annotations of type Label
- class datumaro.plugins.transforms.BboxValuesDecrement(extractor: IDataset)[source]#
Bases:
ItemTransform
,CliPlugin
Subtracts one from the coordinates of bounding boxes
- class datumaro.plugins.transforms.ResizeTransform(extractor: IDataset, width: int, height: int)[source]#
Bases:
ItemTransform
Resizes images and annotations in the dataset to the specified size. Supports upscaling, downscaling and mixed variants.
- Examples:
Resize all images to 256x256 size
resize -dw 256 -dh 256
- class datumaro.plugins.transforms.RemoveItems(extractor: IDataset, ids: Iterable[Tuple[str, str]])[source]#
Bases:
ItemTransform
Allows to remove specific dataset items from dataset by their ids.
Can be useful to clean the dataset from broken or unnecessary samples.
- Examples:
Remove specific items from the dataset
remove_items --id 'image1:train' --id 'image2:test'
- class datumaro.plugins.transforms.RemoveAnnotations(extractor: IDataset, *, ids: Iterable[Tuple[str, str, int | None]])[source]#
Bases:
ItemTransform
Allows to remove annotations on specific dataset items.
Can be useful to clean the dataset from broken or unnecessary annotations.
- Examples:
Remove annotations from specific items in the dataset
remove_annotations --id 'image1:train' --id 'image2:test'
- transform_item(item: DatasetItem)[source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.
- class datumaro.plugins.transforms.RemoveAttributes(extractor: IDataset, ids: Iterable[Tuple[str, str]] | None = None, attributes: Iterable[str] | None = None)[source]#
Bases:
ItemTransform
Allows to remove item and annotation attributes in a dataset.
Can be useful to clean the dataset from broken or unnecessary attributes.
- Examples:
Remove the is_crowd attribute from dataset
remove_attributes --attr 'is_crowd'
Remove the occluded attribute from annotations of the 2010_001705 item in the train subset
remove_attributes --id '2010_001705:train' --attr 'occluded'
- transform_item(item: DatasetItem)[source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.
- class datumaro.plugins.transforms.Correct(extractor: IDataset, reports: str | Dict)[source]#
-
Correct the dataset from a validation report. A user can should feed into validation_reports.json from validator to correct the dataset. This helps to refine the dataset by rejecting undefined labels, missing annotations, and outliers.
- class datumaro.plugins.transforms.AstypeAnnotations(extractor: IDataset, mapping: Dict[str, str] | List[Tuple[str, str]] | None = None)[source]#
Bases:
ItemTransform
Enables the conversion of annotation types for the categories and individual items within a dataset.
Based on a specified mapping, it transforms the annotation types, changing them to ‘Label’ if they are categorical, and to ‘Caption’ if they are of type string, float, or integer.
- Examples:
Convert type of title annotation
astype_annotations --mapping 'title:text'
- transform_item(item: DatasetItem)[source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.
- class datumaro.plugins.transforms.Clean(extractor: IDataset)[source]#
Bases:
ItemTransform
A class used to refine the media items in a dataset.
This class provides methods to clean and preprocess media data within a dataset. The media data can be of various types such as strings, numeric values, or categorical values. The cleaning process for each type of data is handled differently:
String Media: For string data, the class employs natural language processing (NLP)
techniques to remove unnecessary characters. This involves cleaning tasks such as removing special characters, punctuation, and other irrelevant elements to refine the textual data. - Numeric Media: For numeric data, the class identifies and handles outliers and missing values. Outliers are either removed or replaced based on a defined strategy, and missing values are filled using appropriate methods such as mean, median, or a predefined value.