datumaro.components.merge#

Functions

get_merger([merge_policy])

Get Merger according to merge_policy.

datumaro.components.merge.get_merger(merge_policy: str = 'exact', *args, **kwargs) Merger[source]#

Get Merger according to merge_policy. You have to choose an appropriate Merger for your purpose. The available merge policies are “union”, “intersect”, and “exact”.

  1. UnionMerge

Merge several datasets with “union” policy:

  • Label categories are merged according to the union of their label names.

For example, if Dataset-A has {“car”, “cat”, “dog”} and Dataset-B has {“car”, “bus”, “truck”} labels, the merged dataset will have {“bust”, “car”, “cat”, “dog”, “truck”} labels.

  • If there are two or more dataset items whose (id, subset) pairs match each other,

both are included in the merged dataset. At this time, since the same (id, subset) pair cannot be duplicated in the dataset, we add a suffix to the id of each source item. For example, if Dataset-A has DatasetItem(id=”magic”, subset=”train”) and Dataset-B has also DatasetItem(id=”magic”, subset=”train”), the merged dataset will have DatasetItem(id=”magic-0”, subset=”train”) and DatasetItem(id=”magic-1”, subset=”train”).

  1. IntersectMerge

Merge several datasets with “intersect” policy:

  • If there are two or more dataset items whose (id, subset) pairs match each other,

we can consider this as having an intersection in our dataset. This method merges the annotations of the corresponding DatasetItem into one DatasetItem to handle this intersection. The rule to handle merging annotations is provided by AnnotationMerger according to their annotation types. For example, DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(0, 0, 1, 1)]) from Dataset-A and DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(.5, .5, 1, 1)]) from Dataset-B can be merged into DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(0, 0, 1, 1)]).

  • Label categories are merged according to the union of their label names

(Same as UnionMerge). For example, if Dataset-A has {“car”, “cat”, “dog”} and Dataset-B has {“car”, “bus”, “truck”} labels, the merged dataset will have {“bust”, “car”, “cat”, “dog”, “truck”} labels.

  • This merge has configuration parameters (conf) to control the annotation merge behaviors.

For example,

```python merge = IntersectMerge(

conf=IntersectMerge.Conf(

pairwise_dist=0.25, groups=[], output_conf_thresh=0.0, quorum=0,

)

)#

For more details for the parameters, please refer to IntersectMerge.Conf.

  1. ExactMerge

Merges several datasets using the “simple” algorithm:

  • All datasets should have the same categories

  • items are matched by (id, subset) pairs

  • matching items share the media info available:
    • nothing + nothing = nothing

    • nothing + something = something

    • something A + something B = conflict

  • annotations are matched by value and shared

  • in case of conflicts, throws an error

Modules

datumaro.components.merge.base

datumaro.components.merge.exact_merge

datumaro.components.merge.extractor_merger

datumaro.components.merge.intersect_merge

datumaro.components.merge.union_merge