datumaro.components.transformer#

Classes

`ItemTransform`(extractor)
`ModelTransform`(extractor, launcher[, ...])	A transformation class for applying a model's inference to dataset items.
`TabularTransform`(extractor[, batch_size, ...])	A transformation class for processing dataset items in batches with optional parallelism.
`Transform`(extractor)	A base class for dataset transformations that change dataset items or their annotations.

class datumaro.components.transformer.Transform(extractor: IDataset)[source]#

Bases: DatasetBase, CliPlugin

A base class for dataset transformations that change dataset items or their annotations.

static wrap_item(item, **kwargs)[source]#

categories()[source]#: Returns metainfo about dataset labels.

subsets()[source]#: Enumerates subsets in the dataset. Each subset can be a dataset itself.

media_type()[source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

infos()[source]#: Returns meta-info of dataset.

class datumaro.components.transformer.ItemTransform(extractor: IDataset)[source]#

Bases: Transform

transform_item(item: DatasetItem) → DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.transformer.TabularTransform(extractor: IDataset, batch_size: int = 1, num_workers: int = 0)[source]#

Bases: Transform

A transformation class for processing dataset items in batches with optional parallelism.

This class takes a dataset extractor, batch size, and number of worker threads to process dataset items. Depending on the number of workers specified, it can process items either sequentially (single-process) or in parallel (multi-process), making it efficient for batch transformations.

Parameters:

extractor – The dataset extractor to obtain items from.
batch_size – The batch size for processing items. Default is 1.
num_workers – The number of worker threads to use for parallel processing. Set to 0 for single-process mode. Default is 0.

transform_item(item: DatasetItem) → DatasetItem | None[source]#

Returns a modified copy of the input item.

Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.

class datumaro.components.transformer.ModelTransform(extractor: IDataset, launcher: Launcher, batch_size: int = 1, append_annotation: bool = False, num_workers: int = 0)[source]#

Bases: Transform

A transformation class for applying a model’s inference to dataset items.

This class takes an dataset, a launcher, and other optional parameters to transform the dataset item from the model outputs by the launcher. It can process items using multiple processes if specified, making it suitable for parallelized inference tasks.

Parameters:

extractor – The dataset extractor to obtain items from.
launcher – The launcher responsible for model inference.
batch_size – The batch size for processing items. Default is 1.
append_annotation – Whether to append inference annotations to existing annotations. Default is False.
num_workers – The number of worker threads to use for parallel inference. Set to 0 for single-process mode. Default is 0.

get_subset(name)[source]#

infos()[source]#: Returns meta-info of dataset.

categories()[source]#: Returns metainfo about dataset labels.

transform_item(item)[source]#