class datumaro.components.filter.XPathDatasetFilter(extractor: IDataset, xpath: str)[source]#

Bases: ItemTransform

transform_item(item: DatasetItem) DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.filter.XPathAnnotationsFilter(extractor: IDataset, xpath: str, remove_empty: bool = False)[source]#

Bases: ItemTransform

transform_item(item: DatasetItem) DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.filter.UserFunctionDatasetFilter(extractor: IDataset, filter_func: Callable[[DatasetItem], bool])[source]#

Bases: ItemTransform

Filter dataset items using a user-provided Python function.

  • extractor – Datumaro Dataset to filter.

  • filter_func – A Python callable that takes a DatasetItem as its input and returns a boolean. If the return value is True, that DatasetItem will be retained. Otherwise, it is removed.


This is an example of filtering dataset items with images larger than 1024 pixels:

from datumaro.components.media import Image

def filter_func(item: DatasetItem) -> bool:

h, w = item.media_as(Image).size return h > 1024 or w > 1024

filtered = UserFunctionDatasetFilter(

extractor=dataset, filter_func=filter_func)

# No items with an image height or width greater than 1024 filtered_items = [item for item in filtered]

transform_item(item: DatasetItem) DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.filter.UserFunctionAnnotationsFilter(extractor: IDataset, filter_func: Callable[[DatasetItem, Annotation], bool], remove_empty: bool = False)[source]#

Bases: ItemTransform

Filter annotations using a user-provided Python function.

  • extractor – Datumaro Dataset to filter.

  • filter_func – A Python callable that takes DatasetItem and Annotation as its inputs and returns a boolean. If the return value is True, the Annotation will be retained. Otherwise, it is removed.

  • remove_empty – If True, DatasetItem without any annotations is removed after filtering its annotations. Otherwise, do not filter DatasetItem.


This is an example of removing bounding boxes sized greater than 50% of the image size:

from datumaro.components.media import Image from datumaro.components.annotation import Annotation, Bbox

def filter_func(item: DatasetItem, ann: Annotation) -> bool:

# If the annotation is not a Bbox, do not filter if not isinstance(ann, Bbox):

return False

h, w = item.media_as(Image).size image_size = h * w bbox_size = ann.h * ann.w

# Accept Bboxes smaller than 50% of the image size return bbox_size < 0.5 * image_size

filtered = UserFunctionAnnotationsFilter(

extractor=dataset, filter_func=filter_func)

# No bounding boxes with a size greater than 50% of their image filtered_items = [item for item in filtered]

transform_item(item: DatasetItem) DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.filter.Annotation(*, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1)[source]#

Bases: object

A base annotation class.

Derived classes must define the ‘_type’ class variable with a value from the AnnotationType enum.

Method generated by attrs for class Annotation.

id: int#
attributes: Dict[str, Any]#
group: int#
object_id: int#
property type: AnnotationType#
as_dict() Dict[str, Any][source]#

Returns a dictionary { field_name: value }


Returns a modified copy of the object

class datumaro.components.filter.AnnotationType(value)[source]#

Bases: IntEnum

An enumeration.

unknown = 0#
label = 1#
mask = 2#
points = 3#
polygon = 4#
polyline = 5#
bbox = 6#
caption = 7#
cuboid_3d = 8#
super_resolution_annotation = 9#
depth_annotation = 10#
ellipse = 11#
hash_key = 12#
feature_vector = 13#
tabular = 14#
rotated_bbox = 15#
cuboid_2d = 16#
class datumaro.components.filter.Bbox(x, y, w, h, *args, **kwargs)[source]#

Bases: Shape

Bbox annotation class. This class represents a bounding box defined by its top-left corner (x, y) and its width and height (w, h).


The type of annotation, set to AnnotationType.bbox.




Initializes the Bbox with its coordinates and dimensions.


Property to get the x-coordinate of the bounding box.


Property to get the y-coordinate of the bounding box.


Property to get the width of the bounding box.


Property to get the height of the bounding box.


Calculates the area of the bounding box.


Returns the bounding box coordinates and dimensions.


Returns the bounding box as a list of points forming a polygon.


Calculates the Intersection over Union (IoU) with another shape.


Creates a new Bbox instance with updated attributes.

Initialize the Bbox with its top-left corner (x, y) and its width and height (w, h).

  • x (float) – The x-coordinate of the top-left corner.

  • y (float) – The y-coordinate of the top-left corner.

  • w (float) – The width of the bounding box.

  • h (float) – The height of the bounding box.

property x#

Get the x-coordinate of the top-left corner of the bounding box.


The x-coordinate of the bounding box.

Return type:


property y#

Get the y-coordinate of the top-left corner of the bounding box.


The y-coordinate of the bounding box.

Return type:


property w#

Get the width of the bounding box.


The width of the bounding box.

Return type:


property h#

Get the height of the bounding box.


The height of the bounding box.

Return type:



Calculate the area of the bounding box.


The area of the bounding box.

Return type:



Get the bounding box coordinates and dimensions.


The bounding box as [x, y, w, h].

Return type:


as_polygon() List[float][source]#

Convert the bounding box into a polygon representation.


The bounding box as a polygon.

Return type:


iou(other: Shape) float | ~typing.Literal[-1][source]#

Calculate the Intersection over Union (IoU) with another shape.


other (Shape) – The other shape to compare with.


The IoU value or -1 if not applicable.

Return type:

Union[float, Literal[-1]]


Create a new Bbox instance with updated attributes.

  • item (Bbox) – The original Bbox instance.

  • kwargs – Additional attributes to update.


A new Bbox instance with updated attributes.

Return type:


class datumaro.components.filter.Caption(caption, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1)[source]#

Bases: Annotation

Represents arbitrary text annotations.

Method generated by attrs for class Caption.

caption: str#
class datumaro.components.filter.DatasetItemEncoder[source]#

Bases: object

classmethod encode(item: DatasetItem, categories: CategoriesInfo | None = None) ET.ElementBase[source]#
classmethod encode_image(image: Image) ElementBase[source]#
classmethod encode_annotation_base(annotation: Annotation) ElementBase[source]#
classmethod encode_label_object(obj: Label, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_mask_object(obj: Mask, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_bbox_object(obj: Bbox, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_points_object(obj: Points, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_polygon_object(obj: Polygon, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_polyline_object(obj: PolyLine, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_caption_object(obj: Caption) ElementBase[source]#
classmethod encode_ellipse_object(obj: Ellipse, categories: CategoriesInfo | None) ET.ElementBase[source]#
classmethod encode_annotation(o: Annotation, categories: CategoriesInfo | None = None) ET.ElementBase[source]#
static to_string(encoded_item: ElementBase) str[source]#
class datumaro.components.filter.Ellipse(x1: float, y1: float, x2: float, y2: float, *args, **kwargs)[source]#

Bases: Shape

Ellipse represents an ellipse that is encapsulated by a rectangle.

  • x1 and y1 represent the top-left coordinate of the encapsulating rectangle

  • x2 and y2 representing the bottom-right coordinate of the encapsulating rectangle

  • x1 (float) – left x coordinate of encapsulating rectangle

  • y1 (float) – top y coordinate of encapsulating rectangle

  • x2 (float) – right x coordinate of encapsulating rectangle

  • y2 (float) – bottom y coordinate of encapsulating rectangle

Method generated by attrs for class Shape.

property x1#
property y1#
property x2#
property y2#
property w#
property h#
property c_x#
property c_y#

Calculate the area of the shape.


Calculate and return the bounding box of the shape.


The bounding box as [x, y, w, h].

Return type:

Tuple[float, float, float, float]

get_points(num_points: int = 720) List[Tuple[float, float]][source]#

Return points as a list of tuples, e.g. [(x0, y0), (x1, y1), …].


num_points (int) – The number of boundary points of the ellipse. By default, one point is created for every 1 degree of interior angle (num_points=360).

as_polygon(num_points: int = 720) List[float][source]#

Return a polygon as a list of tuples, e.g. [x0, y0, x1, y1, …].


num_points (int) – The number of boundary points of the ellipse. By default, one point is created for every 1 degree of interior angle (num_points=360).

iou(other: Shape) float | ~typing.Literal[-1][source]#
wrap(**kwargs) Ellipse[source]#

Returns a modified copy of the object

class datumaro.components.filter.HashKey(hash_key: ndarray, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1)[source]#

Bases: Annotation

Method generated by attrs for class HashKey.

hash_key: ndarray#
class datumaro.components.filter.Image(size: Tuple[int, int] | None = None, ext: str | None = None, *args, **kwargs)[source]#

Bases: MediaElement[ndarray]

classmethod from_file(path: str, *args, **kwargs)[source]#
classmethod from_numpy(data: ndarray | Callable[[], ndarray], *args, **kwargs)[source]#
classmethod from_bytes(data: bytes | Callable[[], bytes], *args, **kwargs)[source]#
property has_size: bool#

Indicates that size info is cached and won’t require image loading

property size: Tuple[int, int] | None#

Returns (H, W)

property ext: str | None#

Media file extension (with the leading dot)

set_crypter(crypter: Crypter)[source]#
class datumaro.components.filter.ItemTransform(extractor: IDataset)[source]#

Bases: Transform

transform_item(item: DatasetItem) DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.filter.Label(label, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1)[source]#

Bases: Annotation

Method generated by attrs for class Label.

label: int#
class datumaro.components.filter.Mask(image: ndarray | Callable[[], ndarray], *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1, label=None, z_order: int = 0)[source]#

Bases: Annotation

Represents a 2d single-instance binary segmentation mask.

Method generated by attrs for class Mask.

label: int | None#
z_order: int#
property image: ndarray#
as_class_mask(label_id: int | None = None, ignore_index: int = 0, dtype: dtype | None = None) ndarray[source]#

Produces a class index mask based on the binary mask.

  • label_id – Scalar value to represent the class index of the mask. If not specified, self.label will be used. Defaults to None.

  • ignore_index – Scalar value to fill in the zeros in the binary mask. Defaults to 0.

  • dtype – Data type for the resulting mask. If not specified, it will be inferred from the provided label_id to hold its value. For example, if label_id=255, the inferred dtype will be np.uint8. Defaults to None.


Class index mask generated from the binary mask.

Return type:


as_instance_mask(instance_id: int, ignore_index: int = 0, dtype: dtype | None = None) ndarray[source]#

Produces an instance index mask based on the binary mask.

  • instance_id – Scalar value to represent the instance id.

  • ignore_index – Scalar value to fill in the zeros in the binary mask. Defaults to 0.

  • dtype – Data type for the resulting mask. If not specified, it will be inferred from the provided label_id to hold its value. For example, if label_id=255, the inferred dtype will be np.uint8. Defaults to None.


Instance index mask generated from the binary mask.

Return type:


get_area() int[source]#
get_bbox() Tuple[int, int, int, int][source]#

Computes the bounding box of the mask.

Returns: [x, y, w, h]

paint(colormap: Dict[int, Tuple[int, int, int]]) ndarray[source]#

Applies a colormap to the mask and produces the resulting image.

class datumaro.components.filter.Points(points, visibility: List[IntEnum] | None = None, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1, label=None, z_order: int = 0)[source]#

Bases: Shape

Represents an ordered set of points.


The type of annotation, set to AnnotationType.points.




A list indicating the visibility status of each point.



Nested Class:
Visibility (IntEnum): Enum representing the visibility state of points. It has three states:
  • absent: Point is absent (0).

  • hidden: Point is hidden (1).

  • visible: Point is visible (2).


Validates that the number of points is even.


Returns the area covered by the points, always zero.


Returns the bounding box containing all visible or hidden points.

Method generated by attrs for class Points.

class Visibility(value)[source]#

Bases: IntEnum

Enum representing the visibility state of points.


Point is absent (0).




Point is hidden (1).




Point is visible (2).



absent = 0#
hidden = 1#
visible = 2#
visibility: List[IntEnum]#

Returns the area covered by the points.


Always returns 0.

Return type:



Returns the bounding box containing all visible or hidden points.


The bounding box as [x0, y0, width, height].

Return type:


class datumaro.components.filter.PolyLine(points, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1, label=None, z_order: int = 0)[source]#

Bases: Shape

PolyLine annotation class. This class represents a polyline shape, which is a series of connected line segments.


The type of annotation, set to AnnotationType.polyline.




Returns the points of the polyline as a polygon.


Returns the area of the polyline, which is always 0.

Method generated by attrs for class PolyLine.


Convert the shape into a polygon representation.


Calculate the area of the shape.

class datumaro.components.filter.Polygon(points, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1, label=None, z_order: int = 0)[source]#

Bases: Shape

Polygon annotation class. This class represents a polygon shape defined by a series of points.


The type of annotation, set to AnnotationType.polygon.




Validates the points to ensure they form a valid polygon.


Calculates the area of the polygon using the shoelace formula.


Returns the points of the polygon.


Compares this polygon with another for equality.


Helper method to calculate the area of the polygon using the shoelace formula.

Method generated by attrs for class Polygon.


Calculate the area of the polygon using the shoelace formula.


The area of the polygon.

Return type:


as_polygon() List[float][source]#

Return the points of the polygon.


The points of the polygon.

Return type:
