datumaro.components.merge#
Functions
|
Get |
- datumaro.components.merge.get_merger(merge_policy: str = 'exact', *args, **kwargs) Merger [source]#
Get
Merger
according to merge_policy. You have to choose an appropriate Merger for your purpose. The available merge policies are “union”, “intersect”, and “exact”.UnionMerge
Merge several datasets with “union” policy:
Label categories are merged according to the union of their label names.
For example, if Dataset-A has {“car”, “cat”, “dog”} and Dataset-B has {“car”, “bus”, “truck”} labels, the merged dataset will have {“bust”, “car”, “cat”, “dog”, “truck”} labels.
If there are two or more dataset items whose (id, subset) pairs match each other,
both are included in the merged dataset. At this time, since the same (id, subset) pair cannot be duplicated in the dataset, we add a suffix to the id of each source item. For example, if Dataset-A has DatasetItem(id=”magic”, subset=”train”) and Dataset-B has also DatasetItem(id=”magic”, subset=”train”), the merged dataset will have DatasetItem(id=”magic-0”, subset=”train”) and DatasetItem(id=”magic-1”, subset=”train”).
IntersectMerge
Merge several datasets with “intersect” policy:
If there are two or more dataset items whose (id, subset) pairs match each other,
we can consider this as having an intersection in our dataset. This method merges the annotations of the corresponding
DatasetItem
into oneDatasetItem
to handle this intersection. The rule to handle merging annotations is provided byAnnotationMerger
according to their annotation types. For example, DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(0, 0, 1, 1)]) from Dataset-A and DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(.5, .5, 1, 1)]) from Dataset-B can be merged into DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(0, 0, 1, 1)]).Label categories are merged according to the union of their label names
(Same as UnionMerge). For example, if Dataset-A has {“car”, “cat”, “dog”} and Dataset-B has {“car”, “bus”, “truck”} labels, the merged dataset will have {“bust”, “car”, “cat”, “dog”, “truck”} labels.
This merge has configuration parameters (conf) to control the annotation merge behaviors.
For example,
```python merge = IntersectMerge(
- conf=IntersectMerge.Conf(
pairwise_dist=0.25, groups=[], output_conf_thresh=0.0, quorum=0,
)
)#
For more details for the parameters, please refer to
IntersectMerge.Conf
.ExactMerge
Merges several datasets using the “simple” algorithm:
All datasets should have the same categories
items are matched by (id, subset) pairs
- matching items share the media info available:
nothing + nothing = nothing
nothing + something = something
something A + something B = conflict
annotations are matched by value and shared
in case of conflicts, throws an error
Modules