datumaro.components.dataset_base#

Classes

DatasetBase(*, length, subsets, media_type, ...)

A base class for user-defined and built-in extractors.

DatasetItem(id, *[, subset, media, ...])

IDataset()

SubsetBase(*, length, subset, media_type, ...)

A base class for simple, single-subset extractors.

class datumaro.components.dataset_base.DatasetItem(id: str, *, subset: str | None = None, media: str | MediaElement | None = None, annotations: List[Annotation] | None = None, attributes: Dict[str, Any] | None = None)[source]#

Bases: object

id: str#
subset: str#
media: MediaElement | None#
annotations: Annotations#
attributes: Dict[str, Any]#
wrap(**kwargs)[source]#
media_as(t: Type[T]) T[source]#
class datumaro.components.dataset_base.IDataset[source]#

Bases: object

subsets() Dict[str, IDataset][source]#

Enumerates subsets in the dataset. Each subset can be a dataset itself.

get_subset(name) IDataset[source]#
infos() Dict[str, Any][source]#

Returns meta-info of dataset.

categories() Dict[AnnotationType, Categories][source]#

Returns metainfo about dataset labels.

get(id: str, subset: str | None = None) DatasetItem | None[source]#

Provides random access to dataset items.

media_type() Type[MediaElement][source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types() List[AnnotationType][source]#

Returns available task type from dataset annotation types.

property is_stream: bool#

Boolean indicating whether the dataset is a stream

If the dataset is a stream, the dataset item is generated on demand from its iterator.

class datumaro.components.dataset_base.DatasetBase(*, length: int | None = None, subsets: ~typing.Sequence[str] | None = None, media_type: ~typing.Type[~datumaro.components.media.MediaElement] = <class 'datumaro.components.media.Image'>, ann_types: ~typing.List[~datumaro.components.annotation.AnnotationType] | None = None, ctx: ~datumaro.components.contexts.importer.ImportContext | None = None)[source]#

Bases: _DatasetBase, CliPlugin

A base class for user-defined and built-in extractors. Should be used in cases, where SubsetBase is not enough, or its use makes problems with performance, implementation etc.

media_type()[source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types()[source]#

Returns available task type from dataset annotation types.

class datumaro.components.dataset_base.SubsetBase(*, length: int | None = None, subset: str | None = None, media_type: ~typing.Type[~datumaro.components.media.MediaElement] = <class 'datumaro.components.media.Image'>, ann_types: ~typing.List[~datumaro.components.annotation.AnnotationType] | None = None, ctx: ~datumaro.components.contexts.importer.ImportContext | None = None)[source]#

Bases: DatasetBase

A base class for simple, single-subset extractors. Should be used by default for user-defined extractors.

infos()[source]#

Returns meta-info of dataset.

categories()[source]#

Returns metainfo about dataset labels.

get(id, subset=None)[source]#

Provides random access to dataset items.

property subset: str#

Subset name of this instance.