datumaro.plugins.sam_transforms.bbox_to_inst_mask#

Bbox-to-instance mask transform using Segment Anything Model

Classes

SAMBboxToInstanceMask(extractor[, ...])

Convert bounding boxes to instance mask using Segment Anything Model.

class datumaro.plugins.sam_transforms.bbox_to_inst_mask.SAMBboxToInstanceMask(extractor: IDataset, inference_server_type: InferenceServerType = InferenceServerType.ovms, host: str = 'localhost', port: int = 9000, timeout: float = 10.0, tls_config: TLSConfig | None = None, protocol_type: ProtocolType = ProtocolType.grpc, to_polygon: bool = False, num_workers: int = 0)[source]#

Bases: ModelTransform, CliPlugin

Convert bounding boxes to instance mask using Segment Anything Model.

This transform convert all the Bbox annotations in the dataset item to Mask or Polygon annotations (Mask is default). It uses the Segment Anything Model deployed in the OpenVINO™ Model Server or NVIDIA Triton™ Inference Server instance. To launch the server instance, please see the guide in this link: openvinotoolkit/datumaro

Parameters:
  • extractor – Dataset to transform

  • inference_server_type – Inference server type: InferenceServerType.ovms or InferenceServerType.triton

  • host – Host address of the server instance

  • port – Port number of the server instance

  • timeout – Timeout limit during communication between the client and the server instance

  • tls_config – Configuration required if the server instance is in the secure mode

  • protocol_type – Communication protocol type with the server instance

  • to_polygon – If true, the output Mask annotations will be converted to Polygon annotations.

  • num_workers – The number of worker threads to use for parallel inference. Set to 0 for single-process mode. Default is 0.

class datumaro.plugins.sam_transforms.bbox_to_inst_mask.Bbox(x, y, w, h, *args, **kwargs)[source]#

Bases: Shape

Bbox annotation class. This class represents a bounding box defined by its top-left corner (x, y) and its width and height (w, h).

_type#

The type of annotation, set to AnnotationType.bbox.

Type:

AnnotationType

__init__()[source]#

Initializes the Bbox with its coordinates and dimensions.

x()#

Property to get the x-coordinate of the bounding box.

y()#

Property to get the y-coordinate of the bounding box.

w()#

Property to get the width of the bounding box.

h()#

Property to get the height of the bounding box.

get_area()[source]#

Calculates the area of the bounding box.

get_bbox()[source]#

Returns the bounding box coordinates and dimensions.

as_polygon()[source]#

Returns the bounding box as a list of points forming a polygon.

iou()[source]#

Calculates the Intersection over Union (IoU) with another shape.

wrap()[source]#

Creates a new Bbox instance with updated attributes.

Initialize the Bbox with its top-left corner (x, y) and its width and height (w, h).

Parameters:
  • x (float) – The x-coordinate of the top-left corner.

  • y (float) – The y-coordinate of the top-left corner.

  • w (float) – The width of the bounding box.

  • h (float) – The height of the bounding box.

property x#

Get the x-coordinate of the top-left corner of the bounding box.

Returns:

The x-coordinate of the bounding box.

Return type:

float

property y#

Get the y-coordinate of the top-left corner of the bounding box.

Returns:

The y-coordinate of the bounding box.

Return type:

float

property w#

Get the width of the bounding box.

Returns:

The width of the bounding box.

Return type:

float

property h#

Get the height of the bounding box.

Returns:

The height of the bounding box.

Return type:

float

get_area()[source]#

Calculate the area of the bounding box.

Returns:

The area of the bounding box.

Return type:

float

get_bbox()[source]#

Get the bounding box coordinates and dimensions.

Returns:

The bounding box as [x, y, w, h].

Return type:

List[float]

as_polygon() List[float][source]#

Convert the bounding box into a polygon representation.

Returns:

The bounding box as a polygon.

Return type:

List[float]

iou(other: Shape) float | ~typing.Literal[-1][source]#

Calculate the Intersection over Union (IoU) with another shape.

Parameters:

other (Shape) – The other shape to compare with.

Returns:

The IoU value or -1 if not applicable.

Return type:

Union[float, Literal[-1]]

wrap(**kwargs)[source]#

Create a new Bbox instance with updated attributes.

Parameters:
  • item (Bbox) – The original Bbox instance.

  • kwargs – Additional attributes to update.

Returns:

A new Bbox instance with updated attributes.

Return type:

Bbox

class datumaro.plugins.sam_transforms.bbox_to_inst_mask.CliPlugin[source]#

Bases: object

NAME = 'cli_plugin'#
classmethod build_cmdline_parser(**kwargs)[source]#
classmethod parse_cmdline(args=None)[source]#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.DatasetItem(id: str, *, subset: str | None = None, media: str | MediaElement | None = None, annotations: List[Annotation] | None = None, attributes: Dict[str, Any] | None = None)[source]#

Bases: object

id: str#
subset: str#
media: MediaElement | None#
annotations: Annotations#
attributes: Dict[str, Any]#
wrap(**kwargs)[source]#
media_as(t: Type[T]) T[source]#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.IDataset[source]#

Bases: object

subsets() Dict[str, IDataset][source]#

Enumerates subsets in the dataset. Each subset can be a dataset itself.

get_subset(name) IDataset[source]#
infos() Dict[str, Any][source]#

Returns meta-info of dataset.

categories() Dict[AnnotationType, Categories][source]#

Returns metainfo about dataset labels.

get(id: str, subset: str | None = None) DatasetItem | None[source]#

Provides random access to dataset items.

media_type() Type[MediaElement][source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types() List[AnnotationType][source]#

Returns available task type from dataset annotation types.

property is_stream: bool#

Boolean indicating whether the dataset is a stream

If the dataset is a stream, the dataset item is generated on demand from its iterator.

class datumaro.plugins.sam_transforms.bbox_to_inst_mask.InferenceServerType(value)[source]#

Bases: IntEnum

Types of the dedicated inference server

ovms = 0#
triton = 1#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.Mask(image: ndarray | Callable[[], ndarray], *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1, label=None, z_order: int = 0)[source]#

Bases: Annotation

Represents a 2d single-instance binary segmentation mask.

Method generated by attrs for class Mask.

label: int | None#
z_order: int#
property image: ndarray#
as_class_mask(label_id: int | None = None, ignore_index: int = 0, dtype: dtype | None = None) ndarray[source]#

Produces a class index mask based on the binary mask.

Parameters:
  • label_id – Scalar value to represent the class index of the mask. If not specified, self.label will be used. Defaults to None.

  • ignore_index – Scalar value to fill in the zeros in the binary mask. Defaults to 0.

  • dtype – Data type for the resulting mask. If not specified, it will be inferred from the provided label_id to hold its value. For example, if label_id=255, the inferred dtype will be np.uint8. Defaults to None.

Returns:

Class index mask generated from the binary mask.

Return type:

IndexMaskImage

as_instance_mask(instance_id: int, ignore_index: int = 0, dtype: dtype | None = None) ndarray[source]#

Produces an instance index mask based on the binary mask.

Parameters:
  • instance_id – Scalar value to represent the instance id.

  • ignore_index – Scalar value to fill in the zeros in the binary mask. Defaults to 0.

  • dtype – Data type for the resulting mask. If not specified, it will be inferred from the provided label_id to hold its value. For example, if label_id=255, the inferred dtype will be np.uint8. Defaults to None.

Returns:

Instance index mask generated from the binary mask.

Return type:

IndexMaskImage

get_area() int[source]#
get_bbox() Tuple[int, int, int, int][source]#

Computes the bounding box of the mask.

Returns: [x, y, w, h]

paint(colormap: Dict[int, Tuple[int, int, int]]) ndarray[source]#

Applies a colormap to the mask and produces the resulting image.

class datumaro.plugins.sam_transforms.bbox_to_inst_mask.ModelTransform(extractor: IDataset, launcher: Launcher, batch_size: int = 1, append_annotation: bool = False, num_workers: int = 0)[source]#

Bases: Transform

A transformation class for applying a model’s inference to dataset items.

This class takes an dataset, a launcher, and other optional parameters to transform the dataset item from the model outputs by the launcher. It can process items using multiple processes if specified, making it suitable for parallelized inference tasks.

Parameters:
  • extractor – The dataset extractor to obtain items from.

  • launcher – The launcher responsible for model inference.

  • batch_size – The batch size for processing items. Default is 1.

  • append_annotation – Whether to append inference annotations to existing annotations. Default is False.

  • num_workers – The number of worker threads to use for parallel inference. Set to 0 for single-process mode. Default is 0.

get_subset(name)[source]#
infos()[source]#

Returns meta-info of dataset.

categories()[source]#

Returns metainfo about dataset labels.

transform_item(item)[source]#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.OVMSLauncher(model_name: str, model_interpreter_path: str, model_version: int = 0, host: str = 'localhost', port: int = 9000, timeout: float = 10.0, tls_config: TLSConfig | None = None, protocol_type: ProtocolType = ProtocolType.grpc)[source]#

Bases: LauncherForDedicatedInferenceServer[Union[GrpcClient, HttpClient]]

Inference launcher for OVMS (OpenVINO™ Model Server) (openvinotoolkit/model_server)

Parameters:
  • model_name – Name of the model. It should match with the model name loaded in the server instance.

  • model_interpreter_path – Python source code path which implements a model interpreter. The model interpreter implement pre-processing of the model input and post-processing of the model output.

  • model_version – Version of the model loaded in the server instance

  • host – Host address of the server instance

  • port – Port number of the server instance

  • timeout – Timeout limit during communication between the client and the server instance

  • tls_config – Configuration required if the server instance is in the secure mode

  • protocol_type – Communication protocol type with the server instance

infer(inputs: ndarray | Dict[str, ndarray]) List[Dict[str, ndarray] | List[Dict[str, ndarray]]][source]#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.Polygon(points, *, id: int = 0, attributes: Dict[str, Any] = _Nothing.NOTHING, group: int = 0, object_id: int = -1, label=None, z_order: int = 0)[source]#

Bases: Shape

Polygon annotation class. This class represents a polygon shape defined by a series of points.

_type#

The type of annotation, set to AnnotationType.polygon.

Type:

AnnotationType

__attrs_post_init__()[source]#

Validates the points to ensure they form a valid polygon.

get_area()[source]#

Calculates the area of the polygon using the shoelace formula.

as_polygon()[source]#

Returns the points of the polygon.

__eq__()[source]#

Compares this polygon with another for equality.

_get_shoelace_area()[source]#

Helper method to calculate the area of the polygon using the shoelace formula.

Method generated by attrs for class Polygon.

get_area()[source]#

Calculate the area of the polygon using the shoelace formula.

Returns:

The area of the polygon.

Return type:

float

as_polygon() List[float][source]#

Return the points of the polygon.

Returns:

The points of the polygon.

Return type:

List[float]

class datumaro.plugins.sam_transforms.bbox_to_inst_mask.ProtocolType(value)[source]#

Bases: IntEnum

Protocol type for communication with dedicated inference server

grpc = 0#
http = 1#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.TLSConfig(client_key_path: str, client_cert_path: str, server_cert_path: str)[source]#

Bases: object

TLS configuration dataclass

Parameters:
  • client_key_path – Path to client key file

  • client_cert_path – Path to client certificate file

  • server_cert_path – Path to server certificate file

client_key_path: str#
client_cert_path: str#
server_cert_path: str#
as_dict() Dict[str, str][source]#
as_grpc_creds() ChannelCredentials[source]#
class datumaro.plugins.sam_transforms.bbox_to_inst_mask.TritonLauncher(model_name: str, model_interpreter_path: str, model_version: int = 0, host: str = 'localhost', port: int = 9000, timeout: float = 10.0, tls_config: TLSConfig | None = None, protocol_type: ProtocolType = ProtocolType.grpc)[source]#

Bases: LauncherForDedicatedInferenceServer[Union[InferenceServerClient, InferenceServerClient]]

Inference launcher for Triton Inference Server (triton-inference-server)

Parameters:
  • model_name – Name of the model. It should match with the model name loaded in the server instance.

  • model_interpreter_path – Python source code path which implements a model interpreter. The model interpreter implement pre-processing of the model input and post-processing of the model output.

  • model_version – Version of the model loaded in the server instance

  • host – Host address of the server instance

  • port – Port number of the server instance

  • timeout – Timeout limit during communication between the client and the server instance

  • tls_config – Configuration required if the server instance is in the secure mode

  • protocol_type – Communication protocol type with the server instance

infer(inputs: ndarray | Dict[str, ndarray]) List[Dict[str, ndarray] | List[Dict[str, ndarray]]][source]#
datumaro.plugins.sam_transforms.bbox_to_inst_mask.extract_contours(mask)[source]#

Convert an instance mask to polygons

Parameters:
  • mask – a 2d binary mask

  • tolerance – maximum distance from original points of a polygon to the approximated ones

  • area_threshold – minimal area of generated polygons

Returns:

A list of polygons like [[x1,y1, x2,y2 …], […]]