datumaro.components.merge.intersect_merge#

Classes

IntersectMerge([conf])

Merge several datasets with "intersect" policy:

class datumaro.components.merge.intersect_merge.IntersectMerge(conf=_Nothing.NOTHING)[source]#

Bases: Merger

Merge several datasets with “intersect” policy:

  • If there are two or more dataset items whose (id, subset) pairs match each other,

we can consider this as having an intersection in our dataset. This method merges the annotations of the corresponding DatasetItem into one DatasetItem to handle this intersection. The rule to handle merging annotations is provided by AnnotationMerger according to their annotation types. For example, DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(0, 0, 1, 1)]) from Dataset-A and DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(.5, .5, 1, 1)]) from Dataset-B can be merged into DatasetItem(id=”item_1”, subset=”train”, annotations=[Bbox(0, 0, 1, 1)]).

  • Label categories are merged according to the union of their label names

(Same as UnionMerge). For example, if Dataset-A has {“car”, “cat”, “dog”} and Dataset-B has {“car”, “bus”, “truck”} labels, the merged dataset will have {“bust”, “car”, “cat”, “dog”, “truck”} labels.

  • This merge has configuration parameters (conf) to control the annotation merge behaviors.

For example,

```python merge = IntersectMerge(

conf=IntersectMerge.Conf(

pairwise_dist=0.25, groups=[], output_conf_thresh=0.0, quorum=0,

)

)#

For more details for the parameters, please refer to IntersectMerge.Conf.

Method generated by attrs for class IntersectMerge.

class Conf(*, pairwise_dist=0.5, sigma=_Nothing.NOTHING, output_conf_thresh=0, quorum=0, ignored_attributes=_Nothing.NOTHING, groups=_Nothing.NOTHING, close_distance=0.75)[source]#

Bases: object

Parameters:
  • pairwise_dist – IoU match threshold for segments

  • sigma – Parameter for Object Keypoint Similarity metric (https://cocodataset.org/#keypoints-eval)

  • output_conf_thresh – Confidence threshold for output annotations

  • quorum – Minimum count for a label and attribute voting results to be counted

  • ignored_attributes – Attributes to be ignored in the merged DatasetItem

  • groups – A comma-separated list of labels in annotation groups to check. ‘?’ postfix can be added to a label to make it optional in the group (repeatable)

  • close_distance – Distance threshold between annotations to decide their closeness. If they are decided to be close, it will be enrolled to the error tracker.

Method generated by attrs for class IntersectMerge.Conf.

add_item_error(error, *args, **kwargs)[source]#
merge(sources: Sequence[IDataset]) DatasetItemStorage[source]#
get_ann_source(ann_id)[source]#
merge_categories(sources: Sequence[IDataset]) Dict[source]#
merge_items(items: Dict[int, DatasetItem]) DatasetItem[source]#
merge_annotations(sources)[source]#
match_items(datasets)[source]#
get_any_label_name(ann, label_id)[source]#
class datumaro.components.merge.intersect_merge.AnnotationMerger(*, context: IMatcherContext | IMergerContext | None = None)[source]#

Bases: AnnotationMatcher

Method generated by attrs for class AnnotationMerger.

merge_clusters(clusters)[source]#
class datumaro.components.merge.intersect_merge.AnnotationType(value)[source]#

Bases: IntEnum

An enumeration.

unknown = 0#
label = 1#
mask = 2#
points = 3#
polygon = 4#
polyline = 5#
bbox = 6#
caption = 7#
cuboid_3d = 8#
super_resolution_annotation = 9#
depth_annotation = 10#
ellipse = 11#
hash_key = 12#
feature_vector = 13#
tabular = 14#
rotated_bbox = 15#
cuboid_2d = 16#
exception datumaro.components.merge.intersect_merge.AnnotationsTooCloseError(item_id, a, b, distance)[source]#

Bases: DatasetQualityError

Method generated by attrs for class AnnotationsTooCloseError.

item_id#
a#
b#
distance#
class datumaro.components.merge.intersect_merge.BboxMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, BboxMatcher

Method generated by attrs for class BboxMerger.

class datumaro.components.merge.intersect_merge.CaptionsMerger(*, context: IMatcherContext | IMergerContext | None = None)[source]#

Bases: AnnotationMerger, CaptionsMatcher

Method generated by attrs for class CaptionsMerger.

exception datumaro.components.merge.intersect_merge.ConflictingCategoriesError(msg=None, *, sources=None)[source]#

Bases: DatasetMergeError

sources#
class datumaro.components.merge.intersect_merge.Cuboid2DMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, Cuboid2DMatcher

Method generated by attrs for class Cuboid2DMerger.

class datumaro.components.merge.intersect_merge.Cuboid3dMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, Cuboid3dMatcher

Method generated by attrs for class Cuboid3dMerger.

merge_cluster(cluster)[source]#
class datumaro.components.merge.intersect_merge.DatasetItem(id: str, *, subset: str | None = None, media: str | MediaElement | None = None, annotations: List[Annotation] | None = None, attributes: Dict[str, Any] | None = None)[source]#

Bases: object

id: str#
subset: str#
media: MediaElement | None#
annotations: Annotations#
attributes: Dict[str, Any]#
wrap(**kwargs)[source]#
media_as(t: Type[T]) T[source]#
class datumaro.components.merge.intersect_merge.DatasetItemStorage[source]#

Bases: object

is_empty() bool[source]#
put(item: DatasetItem) bool[source]#
get(id: str | DatasetItem, subset: str | None = None, dummy: Any | None = None) DatasetItem | None[source]#
remove(id: str | DatasetItem, subset: str | None = None) bool[source]#
get_subset(name)[source]#
subsets()[source]#
get_annotated_items()[source]#
get_datasetitem_by_path(path)[source]#
get_annotations()[source]#
class datumaro.components.merge.intersect_merge.DatasetItemStorageDatasetView(parent: DatasetItemStorage, infos: Dict[str, Any], categories: Dict[AnnotationType, Categories], media_type: Type[MediaElement] | None, ann_types: Set[AnnotationType] | None)[source]#

Bases: IDataset

class Subset(parent: DatasetItemStorageDatasetView, name: str)[source]#

Bases: IDataset

put(item)[source]#
get(id, subset=None)[source]#

Provides random access to dataset items.

remove(id, subset=None)[source]#
get_subset(name)[source]#
subsets()[source]#

Enumerates subsets in the dataset. Each subset can be a dataset itself.

infos()[source]#

Returns meta-info of dataset.

categories()[source]#

Returns metainfo about dataset labels.

media_type()[source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types()[source]#

Returns available task type from dataset annotation types.

infos()[source]#

Returns meta-info of dataset.

categories()[source]#

Returns metainfo about dataset labels.

get_subset(name)[source]#
subsets()[source]#

Enumerates subsets in the dataset. Each subset can be a dataset itself.

get(id, subset=None)[source]#

Provides random access to dataset items.

media_type()[source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types()[source]#

Returns available task type from dataset annotation types.

class datumaro.components.merge.intersect_merge.EllipseMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, ShapeMatcher

Method generated by attrs for class EllipseMerger.

exception datumaro.components.merge.intersect_merge.FailedAttrVotingError(item_id, attr, votes, ann, *, sources=_Nothing.NOTHING)[source]#

Bases: DatasetMergeError

Method generated by attrs for class FailedAttrVotingError.

item_id#
attr#
votes#
ann#
class datumaro.components.merge.intersect_merge.FeatureVectorMerger(*, context: IMatcherContext | IMergerContext | None = None)[source]#

Bases: AnnotationMerger, FeatureVectorMatcher

Method generated by attrs for class FeatureVectorMerger.

class datumaro.components.merge.intersect_merge.HashKeyMerger(*, context: IMatcherContext | IMergerContext | None = None)[source]#

Bases: AnnotationMerger, HashKeyMatcher

Method generated by attrs for class HashKeyMerger.

class datumaro.components.merge.intersect_merge.IDataset[source]#

Bases: object

subsets() Dict[str, IDataset][source]#

Enumerates subsets in the dataset. Each subset can be a dataset itself.

get_subset(name) IDataset[source]#
infos() Dict[str, Any][source]#

Returns meta-info of dataset.

categories() Dict[AnnotationType, Categories][source]#

Returns metainfo about dataset labels.

get(id: str, subset: str | None = None) DatasetItem | None[source]#

Provides random access to dataset items.

media_type() Type[MediaElement][source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types() List[AnnotationType][source]#

Returns available task type from dataset annotation types.

property is_stream: bool#

Boolean indicating whether the dataset is a stream

If the dataset is a stream, the dataset item is generated on demand from its iterator.

class datumaro.components.merge.intersect_merge.ImageAnnotationMerger(*, context: IMatcherContext | IMergerContext | None = None)[source]#

Bases: AnnotationMerger, ImageAnnotationMatcher

Method generated by attrs for class ImageAnnotationMerger.

class datumaro.components.merge.intersect_merge.LabelCategories(items: List[str] = _Nothing.NOTHING, label_groups: List[LabelGroup] = _Nothing.NOTHING, *, attributes: Set[str] = _Nothing.NOTHING)[source]#

Bases: Categories

Method generated by attrs for class LabelCategories.

class Category(name, parent: str = '', attributes: Set[str] = _Nothing.NOTHING)[source]#

Bases: object

Method generated by attrs for class LabelCategories.Category.

name: str#
parent: str#
attributes: Set[str]#
class LabelGroup(name, labels: List[str] = [], group_type: GroupType = GroupType.EXCLUSIVE)[source]#

Bases: object

Method generated by attrs for class LabelCategories.LabelGroup.

name: str#
labels: List[str]#
group_type: GroupType#
items: List[str]#
label_groups: List[LabelGroup]#
classmethod from_iterable(iterable: Iterable[str | Tuple[str] | Tuple[str, str] | Tuple[str, str, List[str]]]) LabelCategories[source]#

Creates a LabelCategories from iterable.

Parameters:

iterable

This iterable object can be:

  • a list of str - will be interpreted as list of Category names

  • a list of positional arguments - will generate Categories with these arguments

Returns: a LabelCategories object

add(name: str, parent: str | None = None, attributes: Set[str] | None = None) int[source]#
add_label_group(name: str, labels: List[str], group_type: GroupType) int[source]#
find(name: str) Tuple[int | None, Category | None][source]#
class datumaro.components.merge.intersect_merge.LabelMerger(*, context: IMatcherContext | IMergerContext | None = None, quorum=0)[source]#

Bases: AnnotationMerger, LabelMatcher

Method generated by attrs for class LabelMerger.

merge_clusters(clusters)[source]#
class datumaro.components.merge.intersect_merge.LineMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, LineMatcher

Method generated by attrs for class LineMerger.

class datumaro.components.merge.intersect_merge.MaskCategories(colormap: Dict[int, Tuple[int, int, int]] = _Nothing.NOTHING, inverse_colormap: Dict[Tuple[int, int, int], int] | None = None, *, attributes: Set[str] = _Nothing.NOTHING)[source]#

Bases: Categories

Describes a color map for segmentation masks.

Method generated by attrs for class MaskCategories.

classmethod generate(size: int = 255, include_background: bool = True) MaskCategories[source]#

Generates MaskCategories with the specified size.

If include_background is True, the result will include the item

“0: (0, 0, 0)”, which is typically used as a background color.

colormap: Dict[int, Tuple[int, int, int]]#
property inverse_colormap: Dict[Tuple[int, int, int], int]#
class datumaro.components.merge.intersect_merge.MaskMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, MaskMatcher

Method generated by attrs for class MaskMerger.

class datumaro.components.merge.intersect_merge.Merger(**options)[source]#

Bases: IMergerContext, CliPlugin

Merge multiple datasets into one dataset

static merge_infos(sources: Sequence[Dict[str, Any]]) Dict[source]#

Merge several IDataset into one IDataset

static merge_categories(sources: Sequence[Dict[AnnotationType, Categories]]) Dict[source]#
static merge_media_types(sources: Sequence[IDataset]) Type[MediaElement] | None[source]#
static merge_ann_types(sources: Sequence[IDataset]) Set[AnnotationType] | None[source]#
save_merge_report(path: str) None[source]#
get_any_label_name(ann, label_id)[source]#
exception datumaro.components.merge.intersect_merge.NoMatchingAnnError(item_id, ann, *, sources=_Nothing.NOTHING)[source]#

Bases: DatasetMergeError

Method generated by attrs for class NoMatchingAnnError.

item_id#
ann#
exception datumaro.components.merge.intersect_merge.NoMatchingItemError(item_id, *, sources=_Nothing.NOTHING)[source]#

Bases: DatasetMergeError

Method generated by attrs for class NoMatchingItemError.

item_id#
class datumaro.components.merge.intersect_merge.OrderedDict[source]#

Bases: dict

Dictionary that remembers insertion order

clear() None.  Remove all items from od.#
popitem(last=True)#

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

move_to_end(key, last=True)#

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

update([E, ]**F) None.  Update D from dict/iterable E and F.#

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

keys() a set-like object providing a view on D's keys#
items() a set-like object providing a view on D's items#
values() an object providing a view on D's values#
pop(key[, default]) v, remove specified key and return the corresponding value.#

If the key is not found, return the default if given; otherwise, raise a KeyError.

setdefault(key, default=None)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

copy() a shallow copy of od#
fromkeys(value=None)#

Create a new ordered dictionary with keys from iterable and values set to value.

class datumaro.components.merge.intersect_merge.PointsCategories(items: Dict[int, Category] = _Nothing.NOTHING, *, attributes: Set[str] = _Nothing.NOTHING)[source]#

Bases: Categories

Describes (key-)point metainfo such as point names and joints.

Method generated by attrs for class PointsCategories.

class Category(labels: List[str] = _Nothing.NOTHING, joints: Set[Tuple[int, int]] = _Nothing.NOTHING)[source]#

Bases: object

Method generated by attrs for class PointsCategories.Category.

labels: List[str]#
joints: Set[Tuple[int, int]]#
items: Dict[int, Category]#
classmethod from_iterable(iterable: Tuple[int, List[str]] | Tuple[int, List[str], Set[Tuple[int, int]]]) PointsCategories[source]#

Create PointsCategories from an iterable.

Parameters:

iterable

An Iterable with the following elements:

  • a label id

  • a list of positional arguments for Categories

Returns:

PointsCategories object

Return type:

PointsCategories

add(label_id: int, labels: Iterable[str] | None = None, joints: Iterable[Tuple[int, int]] | None = None)[source]#
class datumaro.components.merge.intersect_merge.PointsMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0, sigma: list | None = None, instance_map)[source]#

Bases: _ShapeMerger, PointsMatcher

Method generated by attrs for class PointsMerger.

class datumaro.components.merge.intersect_merge.PolygonMerger(*, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, PolygonMatcher

Method generated by attrs for class PolygonMerger.

class datumaro.components.merge.intersect_merge.RotatedBboxMerger(sigma: list | None = None, *, context: ~datumaro.components.abstracts.merger.IMatcherContext | ~datumaro.components.abstracts.merger.IMergerContext | None = None, pairwise_dist=0.9, cluster_dist=-1.0, match_segments=<function match_segments_pair>, quorum=0)[source]#

Bases: _ShapeMerger, RotatedBboxMatcher

Method generated by attrs for class RotatedBboxMerger.

class datumaro.components.merge.intersect_merge.TabularMerger(*, context: IMatcherContext | IMergerContext | None = None)[source]#

Bases: AnnotationMerger, TabularMatcher

Method generated by attrs for class TabularMerger.

exception datumaro.components.merge.intersect_merge.WrongGroupError(item_id, found, expected, group)[source]#

Bases: DatasetQualityError

Method generated by attrs for class WrongGroupError.

item_id#
found#
expected#
group#
datumaro.components.merge.intersect_merge.attrib(default=_Nothing.NOTHING, validator=None, repr=True, cmp=None, hash=None, init=True, metadata=None, type=None, converter=None, factory=None, kw_only=False, eq=None, order=None, on_setattr=None, alias=None)[source]#

Create a new field / attribute on a class.

Identical to attrs.field, except it’s not keyword-only.

Consider using attrs.field in new code (attr.ib will never go away, though).

Warning

Does nothing unless the class is also decorated with attr.s (or similar)!

New in version 15.2.0: convert

New in version 16.3.0: metadata

Changed in version 17.1.0: validator can be a list now.

Changed in version 17.1.0: hash is None and therefore mirrors eq by default.

New in version 17.3.0: type

Deprecated since version 17.4.0: convert

New in version 17.4.0: converter as a replacement for the deprecated convert to achieve consistency with other noun-based arguments.

New in version 18.1.0: factory=f is syntactic sugar for default=attr.Factory(f).

New in version 18.2.0: kw_only

Changed in version 19.2.0: convert keyword argument removed.

Changed in version 19.2.0: repr also accepts a custom callable.

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: on_setattr

Changed in version 20.3.0: kw_only backported to Python 2

Changed in version 21.1.0: eq, order, and cmp also accept a custom callable

Changed in version 21.1.0: cmp undeprecated

New in version 22.2.0: alias

datumaro.components.merge.intersect_merge.attrs(maybe_cls=None, these=None, repr_ns=None, repr=None, cmp=None, hash=None, init=None, slots=False, frozen=False, weakref_slot=True, str=False, auto_attribs=False, kw_only=False, cache_hash=False, auto_exc=False, eq=None, order=None, auto_detect=False, collect_by_mro=False, getstate_setstate=None, on_setattr=None, field_transformer=None, match_args=True, unsafe_hash=None)[source]#

A class decorator that adds dunder methods according to the specified attributes using attr.ib or the these argument.

Consider using attrs.define / attrs.frozen in new code (attr.s will never go away, though).

Parameters:

repr_ns (str) – When using nested classes, there was no way in Python 2 to automatically detect that. This argument allows to set a custom name for a more meaningful repr output. This argument is pointless in Python 3 and is therefore deprecated.

Caution

Refer to attrs.define for the rest of the parameters, but note that they can have different defaults.

Notably, leaving on_setattr as None will not add any hooks.

New in version 16.0.0: slots

New in version 16.1.0: frozen

New in version 16.3.0: str

New in version 16.3.0: Support for __attrs_post_init__.

Changed in version 17.1.0: hash supports None as value which is also the default now.

New in version 17.3.0: auto_attribs

Changed in version 18.1.0: If these is passed, no attributes are deleted from the class body.

Changed in version 18.1.0: If these is ordered, the order is retained.

New in version 18.2.0: weakref_slot

Deprecated since version 18.2.0: __lt__, __le__, __gt__, and __ge__ now raise a DeprecationWarning if the classes compared are subclasses of each other. __eq and __ne__ never tried to compared subclasses to each other.

Changed in version 19.2.0: __lt__, __le__, __gt__, and __ge__ now do not consider subclasses comparable anymore.

New in version 18.2.0: kw_only

New in version 18.2.0: cache_hash

New in version 19.1.0: auto_exc

Deprecated since version 19.2.0: cmp Removal on or after 2021-06-01.

New in version 19.2.0: eq and order

New in version 20.1.0: auto_detect

New in version 20.1.0: collect_by_mro

New in version 20.1.0: getstate_setstate

New in version 20.1.0: on_setattr

New in version 20.3.0: field_transformer

Changed in version 21.1.0: init=False injects __attrs_init__

Changed in version 21.1.0: Support for __attrs_pre_init__

Changed in version 21.1.0: cmp undeprecated

New in version 21.3.0: match_args

New in version 22.2.0: unsafe_hash as an alias for hash (for PEP 681 compliance).

Deprecated since version 24.1.0: repr_ns

Changed in version 24.1.0: Instances are not compared as tuples of attributes anymore, but using a big and condition. This is faster and has more correct behavior for uncomparable values like math.nan.

New in version 24.1.0: If a class has an inherited classmethod called __attrs_init_subclass__, it is executed after the class is created.

Deprecated since version 24.1.0: hash is deprecated in favor of unsafe_hash.

datumaro.components.merge.intersect_merge.ensure_cls(c)[source]#
datumaro.components.merge.intersect_merge.find(iterable, pred=<function <lambda>>, default=None)[source]#
datumaro.components.merge.intersect_merge.find_instances(instance_anns)[source]#
datumaro.components.merge.intersect_merge.max_bbox(annotations: Iterable[Tuple[float, float, float, float] | _Shape | Mask]) Tuple[float, float, float, float][source]#

Computes the maximum bbox for the set of spatial annotations and boxes.

Returns:

(x, y, w, h)

Return type:

bbox (tuple)