datumaro.plugins.sam_transforms#

Transforms using Segment Anything Model

class datumaro.plugins.sam_transforms.SAMBboxToInstanceMask(extractor: IDataset, inference_server_type: InferenceServerType = InferenceServerType.ovms, host: str = 'localhost', port: int = 9000, timeout: float = 10.0, tls_config: TLSConfig | None = None, protocol_type: ProtocolType = ProtocolType.grpc, to_polygon: bool = False, num_workers: int = 0)[source]#

Bases: ModelTransform, CliPlugin

Convert bounding boxes to instance mask using Segment Anything Model.

This transform convert all the Bbox annotations in the dataset item to Mask or Polygon annotations (Mask is default). It uses the Segment Anything Model deployed in the OpenVINO™ Model Server or NVIDIA Triton™ Inference Server instance. To launch the server instance, please see the guide in this link: openvinotoolkit/datumaro

Parameters:

extractor – Dataset to transform
inference_server_type – Inference server type: InferenceServerType.ovms or InferenceServerType.triton
host – Host address of the server instance
port – Port number of the server instance
timeout – Timeout limit during communication between the client and the server instance
tls_config – Configuration required if the server instance is in the secure mode
protocol_type – Communication protocol type with the server instance
to_polygon – If true, the output Mask annotations will be converted to Polygon annotations.
num_workers – The number of worker threads to use for parallel inference. Set to 0 for single-process mode. Default is 0.

class datumaro.plugins.sam_transforms.SAMAutomaticMaskGeneration(extractor: IDataset, inference_server_type: InferenceServerType = InferenceServerType.ovms, host: str = 'localhost', port: int = 9000, timeout: float = 10.0, tls_config: TLSConfig | None = None, protocol_type: ProtocolType = ProtocolType.grpc, num_workers: int = 0, points_per_side: int = 32, points_per_batch: int = 128, mask_threshold: float = 0.0, pred_iou_thresh: float = 0.88, stability_score_thresh: float = 0.95, stability_score_offset: float = 1.0, box_nms_thresh: float = 0.7, min_mask_region_area: int = 0)[source]#

Bases: ModelTransform, CliPlugin

Produce instance segmentation masks automatically using Segment Anything Model (SAM).

This transform can produce instance segmentation mask annotations for each given image. It samples single-point input prompts on a uniform 2D grid over the image. For each prompt, SAM can predict multiple masks. After obtaining the mask candidates, it post-processes them using the given parameters to improve quality and remove duplicates.

It uses the Segment Anything Model deployed in the OpenVINO™ Model Server or NVIDIA Triton™ Inference Server instance. To launch the server instance, please see the guide in this link: openvinotoolkit/datumaro

Parameters:

extractor – Dataset to transform
inference_server_type – Inference server type: InferenceServerType.ovms or InferenceServerType.triton
host – Host address of the server instance
port – Port number of the server instance
timeout – Timeout limit during communication between the client and the server instance
tls_config – Configuration required if the server instance is in the secure mode
protocol_type – Communication protocol type with the server instance
num_workers – The number of worker threads to use for parallel inference. Set to 0 for single-process mode. Default is 0.
points_per_side (int) – The number of points to be sampled along one side of the image. The total number of points is points_per_side**2 on a uniform 2d grid.
points_per_batch (int) – Sets the number of points run simultaneously by the model. Higher numbers may be faster but use more GPU memory.
pred_iou_thresh (float) – A filtering threshold in [0,1], using the model’s predicted mask quality.
stability_score_thresh (float) – A filtering threshold in [0,1], using the stability of the mask under changes to the cutoff used to binarize the model’s mask predictions.
stability_score_offset (float) – The amount to shift the cutoff when calculated the stability score.
box_nms_thresh (float) – The box IoU cutoff used by non-maximal suppression to filter duplicate masks.
min_mask_region_area (int) – If >0, postprocessing will be applied to remove the binary mask which has the number of 1s less than min_mask_region_area.

property points_per_side: int#

Modules

`datumaro.plugins.sam_transforms.automatic_mask_gen`	Automatic mask generation using Segment Anything Model
`datumaro.plugins.sam_transforms.bbox_to_inst_mask`	Bbox-to-instance mask transform using Segment Anything Model
`datumaro.plugins.sam_transforms.interpreters`	Segment Anything Model interpreters