otx.api.entities.datasets#
This module implements the Dataset entity.
Classes
|
A dataset consists of a list of DatasetItemEntities and a purpose. |
|
This DatasetIterator iterates over the dataset lazily. |
|
Describes the purpose for the dataset. |
- class otx.api.entities.datasets.DatasetEntity(items: List[TDatasetItemEntity] | None = None, purpose: DatasetPurpose = DatasetPurpose.INFERENCE)[source]#
Bases:
Generic
[TDatasetItemEntity
]A dataset consists of a list of DatasetItemEntities and a purpose.
## With dataset items
This way assumes the dataset item entities are constructed before the dataset entity is made.
>>> from otx.api.entities.image import Image >>> from otx.api.entities.annotation import NullAnnotationSceneEntity >>> from otx.api.entities.dataset_item import DatasetItemEntity >>> item = DatasetItemEntity(media=Image(file_path="image.jpg"), annotation_scene=NullAnnotationSceneEntity()) >>> dataset = DatasetEntity(items=[item])
## Iterate over dataset
Regardless of the instantiation method chosen, the Dataset will work the same. The dataset can be iterated:
>>> dataset = DatasetEntity(items=[item_1]) >>> for dataset_item in dataset: ... print(dataset_item) DatasetItemEntity( media=Image(image.jpg, width=640, height=480), annotation_scene=NullAnnotationSceneEntity(), roi=Annotation( shape=Rectangle( x=0.0, y=0.0, width=1.0, height=1.0 ), labels=[], id=6149e454893b7ebbe3a8faf6 ), subset=NONE )
A particular item can also be fetched:
>>> first_item = dataset[0]
Or a slice:
>>> first_ten = dataset[:10] >>> last_ten = dataset[-10:]
## Get a subset of Dataset
To get the test data for validating the network:
>>> dataset = DatasetEntity() >>> testing_subset = dataset.get_subset(Subset.TESTING)
This subset is also a DatasetEntity. The entities in the subset dataset refer to the same entities as in the original dataset. Altering one of the objects in the subset, will also alter them in the original.
- Parameters:
items (Optional[List[DatasetItemEntity]]) – A list of dataset items to create dataset with. Defaults to None.
purpose (DatasetPurpose) – Purpose for dataset. Refer to
DatasetPurpose
for more info. Defaults to DatasetPurpose.INFERENCE.
- append(item: TDatasetItemEntity) None [source]#
Append a DatasetItemEntity to the dataset.
Example
Appending a dataset item to a dataset
>>> from otx.api.entities.image import Image >>> from otx.api.entities.annotation import NullAnnotationSceneEntity >>> from otx.api.entities.dataset_item import DatasetItemEntity >>> dataset = DatasetEntity() >>> media = Image(file_path='image.jpg') >>> annotation = NullAnnotationSceneEntity() >>> dataset_item = DatasetItemEntity(media=media, annotation_scene=annotation) >>> dataset.append(dataset_item)
- Parameters:
item (DatasetItemEntity) – item to append
- get_combined_subset(subsets: List[Subset]) DatasetEntity [source]#
Returns a new DatasetEntity with just the dataset items matching the subsets.
These subsets are DatasetEntity. The dataset items in the subset datasets are the same dataset items as in the original dataset. Altering one of the objects in the output of this function, will also alter them in the original.
Example
>>> dataset = DatasetEntity() >>> training_subset = dataset.get_combined_subset([Subset.TRAINING, Subset.UNLABELED])
- Parameters:
subsets (List) – List of subsets to return.
- Returns:
DatasetEntity with items matching subsets
- Return type:
- get_labels(include_empty: bool = False) List[LabelEntity] [source]#
Returns the list of all unique labels that are in the dataset.
Note: This does not respect the ROI of the dataset items.
- Parameters:
include_empty (bool) – set to True to include empty label (if exists) in the output. Defaults to False.
- Returns:
list of labels that appear in the dataset
- Return type:
List[LabelEntity]
- get_subset(subset: Subset) DatasetEntity [source]#
Returns a new DatasetEntity with just the dataset items matching the subset.
This subset is also a DatasetEntity. The dataset items in the subset dataset are the same dataset items as in the original dataset. Altering one of the objects in the output of this function, will also alter them in the original.
Example
>>> dataset = DatasetEntity() >>> training_subset = dataset.get_subset(Subset.TRAINING)
- Parameters:
subset (Subset) – Subset to return.
- Returns:
DatasetEntity with items matching subset
- Return type:
- remove(item: TDatasetItemEntity) None [source]#
Remove an item from the items.
This function calls remove_at_indices function.
- Parameters:
item (DatasetItemEntity) – the item to be deleted.
- Raises:
ValueError – if the input item is not in the dataset
- remove_at_indices(indices: List[int]) None [source]#
Delete items based on the indices.
- Parameters:
indices (List[int]) – the indices of the items that will be deleted from the items.
- sort_items() None [source]#
Order the dataset items. Does nothing here, but may be overridden in child classes.
- Returns:
None
- with_empty_annotations(annotation_kind: AnnotationSceneKind = AnnotationSceneKind.PREDICTION) DatasetEntity [source]#
Produces a new dataset with empty annotation objects (no shapes or labels).
This is a convenience function to generate a dataset with empty annotations from another dataset. This is particularly useful for evaluation on validation data and to build resultsets.
Assume a dataset containing user annotations.
>>> labeled_dataset = Dataset() # user annotated dataset
Then, we want to see the performance of our task on this labeled_dataset, which means we need to create a new dataset to be passed for analysis.
>>> prediction_dataset = labeled_dataset.with_empty_annotations()
Later, we can pass this prediction_dataset to the task analysis function. By pairing the labeled_dataset and the prediction_dataset, the resultset can then be constructed. Refer to otx.api.entities.resultset.ResultSetEntity for more info.
- Parameters:
annotation_kind (AnnotationSceneKind) – Sets the empty annotation to this kind. Defaults to AnnotationSceneKind.PREDICTION
- Returns:
a new dataset containing the same items, with empty annotation objects.
- Return type:
- property purpose: DatasetPurpose#
Returns the DatasetPurpose. For example DatasetPurpose.ANALYSIS.
- Returns:
DatasetPurpose
- class otx.api.entities.datasets.DatasetIterator(dataset: DatasetEntity)[source]#
Bases:
Iterator
This DatasetIterator iterates over the dataset lazily.
Implements collections.abc.Iterator.
- Parameters:
dataset (DatasetEntity) – Dataset to iterate over.