datumaro.components.dataset_storage#
Classes
|
|
|
|
|
|
|
- class datumaro.components.dataset_storage.DatasetPatch(data: DatasetItemStorage, infos: Dict[str, Any], categories: Dict[AnnotationType, Categories], updated_items: Dict[Tuple[str, str], ItemStatus], updated_subsets: Dict[str, ItemStatus] | None = None)[source]#
Bases:
object
- class DatasetPatchWrapper(patch: DatasetPatch, parent: IDataset)[source]#
- property updated_subsets: Dict[str, ItemStatus]#
- class datumaro.components.dataset_storage.DatasetStorage(source: IDataset | DatasetItemStorage, infos: Dict[str, Any] | None = None, categories: Dict[AnnotationType, Categories] | None = None, media_type: Type[MediaElement] | None = None, ann_types: Set[AnnotationType] | None = None)[source]#
Bases:
IDataset
- categories() Dict[AnnotationType, Categories] [source]#
Returns metainfo about dataset labels.
- define_categories(categories: Dict[AnnotationType, Categories])[source]#
- media_type() Type[MediaElement] [source]#
Returns media type of the dataset items.
All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).
- ann_types() Set[AnnotationType] [source]#
Returns available task type from dataset annotation types.
- put(item: DatasetItem) None [source]#
- get(id: str, subset: str | None = None) DatasetItem | None [source]#
Provides random access to dataset items.
- subsets() Dict[str, IDataset] [source]#
Enumerates subsets in the dataset. Each subset can be a dataset itself.
- get_datasetitem_by_path(path: str) DatasetItem | None [source]#
- get_patch() DatasetPatch [source]#
- update(source: DatasetPatch | IDataset | Iterable[DatasetItem])[source]#
- class datumaro.components.dataset_storage.AnnotationType(value)[source]#
Bases:
IntEnum
An enumeration.
- unknown = 0#
- label = 1#
- mask = 2#
- points = 3#
- polygon = 4#
- polyline = 5#
- bbox = 6#
- cuboid_3d = 8#
- super_resolution_annotation = 9#
- depth_annotation = 10#
- ellipse = 11#
- hash_key = 12#
- feature_vector = 13#
- tabular = 14#
- rotated_bbox = 15#
- cuboid_2d = 16#
- exception datumaro.components.dataset_storage.CategoriesRedefinedError[source]#
Bases:
DatasetError
- exception datumaro.components.dataset_storage.ConflictingCategoriesError(msg=None, *, sources=None)[source]#
Bases:
DatasetMergeError
- sources#
- class datumaro.components.dataset_storage.DatasetBase(*, length: int | None = None, subsets: ~typing.Sequence[str] | None = None, media_type: ~typing.Type[~datumaro.components.media.MediaElement] = <class 'datumaro.components.media.Image'>, ann_types: ~typing.List[~datumaro.components.annotation.AnnotationType] | None = None, ctx: ~datumaro.components.contexts.importer.ImportContext | None = None)[source]#
Bases:
_DatasetBase
,CliPlugin
A base class for user-defined and built-in extractors. Should be used in cases, where SubsetBase is not enough, or its use makes problems with performance, implementation etc.
- exception datumaro.components.dataset_storage.DatasetInfosRedefinedError[source]#
Bases:
DatasetError
- class datumaro.components.dataset_storage.DatasetItem(id: str, *, subset: str | None = None, media: str | MediaElement | None = None, annotations: List[Annotation] | None = None, attributes: Dict[str, Any] | None = None)[source]#
Bases:
object
- media: MediaElement | None#
- annotations: Annotations#
- class datumaro.components.dataset_storage.DatasetItemStorage[source]#
Bases:
object
- put(item: DatasetItem) bool [source]#
- get(id: str | DatasetItem, subset: str | None = None, dummy: Any | None = None) DatasetItem | None [source]#
- class datumaro.components.dataset_storage.DatasetItemStorageDatasetView(parent: DatasetItemStorage, infos: Dict[str, Any], categories: Dict[AnnotationType, Categories], media_type: Type[MediaElement] | None, ann_types: Set[AnnotationType] | None)[source]#
Bases:
IDataset
- class Subset(parent: DatasetItemStorageDatasetView, name: str)[source]#
Bases:
IDataset
- class datumaro.components.dataset_storage.IDataset[source]#
Bases:
object
- subsets() Dict[str, IDataset] [source]#
Enumerates subsets in the dataset. Each subset can be a dataset itself.
- categories() Dict[AnnotationType, Categories] [source]#
Returns metainfo about dataset labels.
- get(id: str, subset: str | None = None) DatasetItem | None [source]#
Provides random access to dataset items.
- media_type() Type[MediaElement] [source]#
Returns media type of the dataset items.
All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).
- ann_types() List[AnnotationType] [source]#
Returns available task type from dataset annotation types.
- class datumaro.components.dataset_storage.ItemStatus(value)[source]#
Bases:
Enum
An enumeration.
- added = 1#
- modified = 2#
- removed = 3#
- class datumaro.components.dataset_storage.ItemTransform(extractor: IDataset)[source]#
Bases:
Transform
- transform_item(item: DatasetItem) DatasetItem | None [source]#
Returns a modified copy of the input item.
Avoid changing and returning the input item, because it can lead to unexpected problems. Use wrap_item() or item.wrap() to simplify copying.
- class datumaro.components.dataset_storage.LabelCategories(items: List[str] = _Nothing.NOTHING, label_groups: List[LabelGroup] = _Nothing.NOTHING, *, attributes: Set[str] = _Nothing.NOTHING)[source]#
Bases:
Categories
Method generated by attrs for class LabelCategories.
- class Category(name, parent: str = '', attributes: Set[str] = _Nothing.NOTHING)[source]#
Bases:
object
Method generated by attrs for class LabelCategories.Category.
- class LabelGroup(name, labels: List[str] = [], group_type: GroupType = GroupType.EXCLUSIVE)[source]#
Bases:
object
Method generated by attrs for class LabelCategories.LabelGroup.
- label_groups: List[LabelGroup]#
- classmethod from_iterable(iterable: Iterable[str | Tuple[str] | Tuple[str, str] | Tuple[str, str, List[str]]]) LabelCategories [source]#
Creates a LabelCategories from iterable.
- Parameters:
iterable –
This iterable object can be:
a list of str - will be interpreted as list of Category names
a list of positional arguments - will generate Categories with these arguments
Returns: a LabelCategories object
- class datumaro.components.dataset_storage.MediaElement(crypter: ~datumaro.components.crypter.Crypter = <datumaro.components.crypter.NullCrypter object>, *args, **kwargs)[source]#
Bases:
Generic
[AnyData
]
- exception datumaro.components.dataset_storage.MediaTypeError[source]#
Bases:
DatumaroError
- exception datumaro.components.dataset_storage.NotAvailableError[source]#
Bases:
DatumaroError
- exception datumaro.components.dataset_storage.RepeatedItemError(item_id)[source]#
Bases:
DatasetError
Method generated by attrs for class RepeatedItemError.
- item_id#
- class datumaro.components.dataset_storage.StreamDatasetStorage(source: IDataset, infos: Dict[str, Any] | None = None, categories: Dict[AnnotationType, Categories] | None = None, media_type: Type[MediaElement] | None = None, ann_types: Set[AnnotationType] | None = None)[source]#
Bases:
DatasetStorage
- put(item: DatasetItem) None [source]#
- get(id: str, subset: str | None = None) DatasetItem | None [source]#
Provides random access to dataset items.
- property subset_names#
- subsets() Dict[str, IDataset] [source]#
Enumerates subsets in the dataset. Each subset can be a dataset itself.
- get_datasetitem_by_path(path: str) DatasetItem | None [source]#
- update(source: DatasetPatch | IDataset | Iterable[DatasetItem])[source]#
- categories() Dict[AnnotationType, Categories] [source]#
Returns metainfo about dataset labels.
- class datumaro.components.dataset_storage.StreamSubset(source: IDataset, subset: str)[source]#
Bases:
IDataset
- subsets() Dict[str, IDataset] [source]#
Enumerates subsets in the dataset. Each subset can be a dataset itself.
- categories() Dict[AnnotationType, Categories] [source]#
Returns metainfo about dataset labels.
- get(id: str, subset: str | None = None) DatasetItem | None [source]#
Provides random access to dataset items.
- media_type() Type[MediaElement] [source]#
Returns media type of the dataset items.
All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).
- ann_types() Set[AnnotationType] [source]#
Returns available task type from dataset annotation types.
- class datumaro.components.dataset_storage.Transform(extractor: IDataset)[source]#
Bases:
DatasetBase
,CliPlugin
A base class for dataset transformations that change dataset items or their annotations.