otx.api.entities.datasets#

This module implements the Dataset entity.

Classes

DatasetEntity([items, purpose])

A dataset consists of a list of DatasetItemEntities and a purpose.

DatasetIterator(dataset)

This DatasetIterator iterates over the dataset lazily.

DatasetPurpose(value)

Describes the purpose for the dataset.

class otx.api.entities.datasets.DatasetEntity(items: List[TDatasetItemEntity] | None = None, purpose: DatasetPurpose = DatasetPurpose.INFERENCE)[source]#

Bases: Generic[TDatasetItemEntity]

A dataset consists of a list of DatasetItemEntities and a purpose.

## With dataset items

This way assumes the dataset item entities are constructed before the dataset entity is made.

>>> from otx.api.entities.image import Image
>>> from otx.api.entities.annotation import NullAnnotationSceneEntity
>>> from otx.api.entities.dataset_item import DatasetItemEntity
>>> item = DatasetItemEntity(media=Image(file_path="image.jpg"), annotation_scene=NullAnnotationSceneEntity())
>>> dataset = DatasetEntity(items=[item])

## Iterate over dataset

Regardless of the instantiation method chosen, the Dataset will work the same. The dataset can be iterated:

>>> dataset = DatasetEntity(items=[item_1])
>>> for dataset_item in dataset:
...     print(dataset_item)
DatasetItemEntity(
    media=Image(image.jpg, width=640, height=480),
    annotation_scene=NullAnnotationSceneEntity(),
    roi=Annotation(
        shape=Rectangle(
            x=0.0,
            y=0.0,
            width=1.0,
            height=1.0
            ),
            labels=[],
            id=6149e454893b7ebbe3a8faf6
        ),
    subset=NONE
)

A particular item can also be fetched:

>>> first_item = dataset[0]

Or a slice:

>>> first_ten = dataset[:10]
>>> last_ten = dataset[-10:]

## Get a subset of Dataset

To get the test data for validating the network:

>>> dataset = DatasetEntity()
>>> testing_subset = dataset.get_subset(Subset.TESTING)

This subset is also a DatasetEntity. The entities in the subset dataset refer to the same entities as in the original dataset. Altering one of the objects in the subset, will also alter them in the original.

Parameters:
  • items (Optional[List[DatasetItemEntity]]) – A list of dataset items to create dataset with. Defaults to None.

  • purpose (DatasetPurpose) – Purpose for dataset. Refer to DatasetPurpose for more info. Defaults to DatasetPurpose.INFERENCE.

append(item: TDatasetItemEntity) None[source]#

Append a DatasetItemEntity to the dataset.

Example

Appending a dataset item to a dataset

>>> from otx.api.entities.image import Image
>>> from otx.api.entities.annotation import NullAnnotationSceneEntity
>>> from otx.api.entities.dataset_item import DatasetItemEntity
>>> dataset = DatasetEntity()
>>> media = Image(file_path='image.jpg')
>>> annotation = NullAnnotationSceneEntity()
>>> dataset_item = DatasetItemEntity(media=media, annotation_scene=annotation)
>>> dataset.append(dataset_item)
Parameters:

item (DatasetItemEntity) – item to append

get_combined_subset(subsets: List[Subset]) DatasetEntity[source]#

Returns a new DatasetEntity with just the dataset items matching the subsets.

These subsets are DatasetEntity. The dataset items in the subset datasets are the same dataset items as in the original dataset. Altering one of the objects in the output of this function, will also alter them in the original.

Example

>>> dataset = DatasetEntity()
>>> training_subset = dataset.get_combined_subset([Subset.TRAINING, Subset.UNLABELED])
Parameters:

subsets (List) – List of subsets to return.

Returns:

DatasetEntity with items matching subsets

Return type:

DatasetEntity

get_labels(include_empty: bool = False) List[LabelEntity][source]#

Returns the list of all unique labels that are in the dataset.

Note: This does not respect the ROI of the dataset items.

Parameters:

include_empty (bool) – set to True to include empty label (if exists) in the output. Defaults to False.

Returns:

list of labels that appear in the dataset

Return type:

List[LabelEntity]

get_subset(subset: Subset) DatasetEntity[source]#

Returns a new DatasetEntity with just the dataset items matching the subset.

This subset is also a DatasetEntity. The dataset items in the subset dataset are the same dataset items as in the original dataset. Altering one of the objects in the output of this function, will also alter them in the original.

Example

>>> dataset = DatasetEntity()
>>> training_subset = dataset.get_subset(Subset.TRAINING)
Parameters:

subset (Subset) – Subset to return.

Returns:

DatasetEntity with items matching subset

Return type:

DatasetEntity

remove(item: TDatasetItemEntity) None[source]#

Remove an item from the items.

This function calls remove_at_indices function.

Parameters:

item (DatasetItemEntity) – the item to be deleted.

Raises:

ValueError – if the input item is not in the dataset

remove_at_indices(indices: List[int]) None[source]#

Delete items based on the indices.

Parameters:

indices (List[int]) – the indices of the items that will be deleted from the items.

sort_items() None[source]#

Order the dataset items. Does nothing here, but may be overridden in child classes.

Returns:

None

with_empty_annotations(annotation_kind: AnnotationSceneKind = AnnotationSceneKind.PREDICTION) DatasetEntity[source]#

Produces a new dataset with empty annotation objects (no shapes or labels).

This is a convenience function to generate a dataset with empty annotations from another dataset. This is particularly useful for evaluation on validation data and to build resultsets.

Assume a dataset containing user annotations.

>>> labeled_dataset = Dataset()  # user annotated dataset

Then, we want to see the performance of our task on this labeled_dataset, which means we need to create a new dataset to be passed for analysis.

>>> prediction_dataset = labeled_dataset.with_empty_annotations()

Later, we can pass this prediction_dataset to the task analysis function. By pairing the labeled_dataset and the prediction_dataset, the resultset can then be constructed. Refer to otx.api.entities.resultset.ResultSetEntity for more info.

Parameters:

annotation_kind (AnnotationSceneKind) – Sets the empty annotation to this kind. Defaults to AnnotationSceneKind.PREDICTION

Returns:

a new dataset containing the same items, with empty annotation objects.

Return type:

DatasetEntity

property purpose: DatasetPurpose#

Returns the DatasetPurpose. For example DatasetPurpose.ANALYSIS.

Returns:

DatasetPurpose

class otx.api.entities.datasets.DatasetIterator(dataset: DatasetEntity)[source]#

Bases: Iterator

This DatasetIterator iterates over the dataset lazily.

Implements collections.abc.Iterator.

Parameters:

dataset (DatasetEntity) – Dataset to iterate over.

class otx.api.entities.datasets.DatasetPurpose(value)[source]#

Bases: Enum

Describes the purpose for the dataset.

This makes it possible to identify datasets for a particular use.