geti_sdk.annotation_readers
Introduction
The annotation_readers package contains the
AnnotationReader
base class, which provides an interface for implementing custom annotation readers.
Annotation readers server to load annotation files in custom formats and convert them to Intel® Geti™ format, such that they can be uploaded to an Intel® Geti™ project.
Module contents
- class geti_sdk.annotation_readers.base_annotation_reader.AnnotationReader(base_data_folder: str, annotation_format: str = '.json', task_type: TaskType | str = TaskType.DETECTION, anomaly_reduction: bool = False)
Bases:
object
Base class for annotation reading, to handle loading and converting annotations to Intel Geti format
- abstract get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False) List[Annotation]
Get annotation data for a certain filename
- get_data_filenames() List[str]
Return a list of annotation files found in the base_data_folder.
- Returns:
List of filenames (excluding extension) for all annotation files in the data folder
- abstract get_all_label_names() List[str]
Return a list of unique label names that were found in the annotation data folder belonging to this AnnotationReader instance.
- prepare_and_set_dataset(task_type: TaskType | str, previous_task_type: TaskType | None = None) None
Prepare a dataset for uploading annotations for a certain task_type.
- Parameters:
task_type – TaskType to prepare the dataset for
previous_task_type – Optional type of the (trainable) task preceding the current task in the pipeline. This is only used for global tasks
- property applied_filters: List[Dict[str, List[str] | str]]
Return a list of dictionaries representing the filter settings that have been applied to the dataset, if any.
Dictionaries in this list contain two keys:
‘labels’ – List of label names which has been filtered on
- ‘criterion’ – String representing the criterion that has been used in the
filtering. Can be ‘OR’, ‘AND’, ‘XOR’ or ‘NOT’.
- Returns:
List of filter settings that have been applied to the dataset. Returns an empty list if no filters have been applied.
- class geti_sdk.annotation_readers.geti_annotation_reader.GetiAnnotationReader(base_data_folder: str, annotation_format: str = '.json', task_type: TaskType | str | None = None, label_names_to_include: List[str] | None = None, anomaly_reduction: bool = False)
Bases:
AnnotationReader
AnnotationReader for loading annotation files in Intel® Geti™ format.
- get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False) List[Annotation]
Return the annotation data for the dataset item corresponding to filename.
- Parameters:
filename – name of the item to get the annotation data for.
label_name_to_id_mapping – mapping of label name to label id.
media_information – MediaInformation object containing information (e.g. width, height) about the media item to upload the annotation for
preserve_shape_for_global_labels –
False to convert shapes for global tasks to full rectangles (required for classification like tasks in Intel® Geti™ projects), True to preserve such shapes. This parameter should be:
False when uploading annotations to a single task project
- True when uploading annotations for a classification like task,
following a local task in a task chain project.
- Returns:
List of Annotation objects containing all annotations for the given dataset item.
- get_all_label_names() List[str]
Retrieve the unique label names for all annotations in the annotation folder
- Returns:
List of label names
- class geti_sdk.annotation_readers.datumaro_annotation_reader.datumaro_annotation_reader.DatumAnnotationReader(base_data_folder: str, annotation_format: str, task_type: TaskType | str = TaskType.DETECTION)
Bases:
AnnotationReader
Class to read annotations using datumaro
- prepare_and_set_dataset(task_type: TaskType | str, previous_task_type: TaskType | None = None) None
Prepare the dataset for a specific task_type. This could involve for example conversion of annotation shapes.
- Parameters:
task_type – TaskType for which to prepare the dataset.
previous_task_type – TaskType preceding the task to prepare the dataset for
- convert_labels_to_segmentation_names() None
Convert the label names in a dataset to ‘*_shape`, where * is the original label name. It can be used to generate unique label names for the segmentation task in a detection_to_segmentation project
- get_all_label_names() List[str]
Retrieve the list of labels names from a datumaro dataset.
- property datum_label_map: Dict[int, str]
- Returns:
Dictionary mapping the datumaro label id to the label name
- override_label_map(new_label_map: Dict[int, str])
Override the label map defined in the datumaro dataset
- reset_label_map()
Reset the label map back to the original one from the datumaro dataset.
- get_all_image_names() List[str]
Return a list of image names in the dataset
- get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False) List[Annotation]
Return the annotation data for the dataset item corresponding to filename.
- Parameters:
filename – name of the item to get the annotation data for.
label_name_to_id_mapping – mapping of label name to label id.
media_information – MediaInformation object containing information (e.g. width, height) about the media item to upload the annotation for
preserve_shape_for_global_labels –
False to convert shapes for global tasks to full rectangles (required for classification like tasks in Intel® Geti™ projects), True to preserve such shapes. This parameter should be:
False when uploading annotations to a single task project
- True when uploading annotations for a classification like task,
following a local task in a task chain project.
- Returns:
List of Annotation objects containing all annotations for the given dataset item.
- property applied_filters: List[Dict[str, List[str] | str]]
Return a list of filters and their parameters that have been previously applied to the dataset.
- filter_dataset(labels: Sequence[str], criterion='OR') None
Retain only those items with annotations in the list of labels passed.
- Param:
labels List of labels to filter on
- Param:
criterion Filter criterion, currently “OR” or “AND” are implemented
- group_labels(labels_to_group: List[str], group_name: str) None
Group multiple labels into one. Grouping converts the list of labels into one single label named group_name.
This method does not return anything, but instead overrides the label map for the datamaro dataset to account for the grouping.
- Parameters:
labels_to_group – List of labels names that should be grouped together
group_name – Name of the resulting label
- Returns:
- get_annotation_stats() Dict[str, Dict[str, int]]
Return the object and image counts per label in the dataset.
- Returns:
Dictionary containing label names as keys, and as values: - n_images: Number of images containing this label - n_objects: Number of independent objects with this label
- class geti_sdk.annotation_readers.directory_tree_annotation_reader.DirectoryTreeAnnotationReader(base_data_folder: str, subset_folder_names: Sequence[str] | None = None, task_type: TaskType | str = TaskType.CLASSIFICATION)
Bases:
AnnotationReader
AnnotationReader for loading single label classification annotations from a dataset organized in a directory tree. This annotation reader expects images to be put in folders, where the name of each image folder corresponds to the label that should be assigned to all images inside it.
- Parameters:
base_data_folder – Root of the directory tree that contains the dataset
subset_folder_names – Optional list of subfolders of the base_data_folder that should not be considered as labels, but should be used to acquire the data. For example [‘train’, ‘validation’, ‘test’] for a dataset that is split into three subsets.
task_type – TaskType for the task in the Intel® Geti™ platform to which the annotations should be uploaded
- property label_map: Dict[str, str]
Return the label map for the dataset, mapping the root label names (keys) to potential new label names (values). It is used to filter or group the dataset.
If no filters or grouping has been applied, it returns a dictionary with key, value pairs that have identical keys and values, i.e. {“dog”: “dog”}
- reset_filters_and_grouping()
Reset the applied filters and grouping, to recover the original dataset
- get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False, image_name_as_full_path: bool = False) List[Annotation]
Return the list of annotations for the media item with name filename
- Parameters:
filename – Name of the item to return the annotations for
label_name_to_id_mapping – Dictionary mapping the name of a label to its unique database ID
media_information – MediaInformation object containing information (e.g. width, height) about the media item to upload the annotation for
preserve_shape_for_global_labels – Unused parameter in this type of annotation reader
image_name_as_full_path – Set to True if the filename contains the full path to the image
- Returns:
A list of Annotation objects for the media item
- get_all_label_names() List[str]
Identify all label names contained in the dataset
- get_data_filenames() List[str]
Return a list of annotated media files found in the dataset.
- Returns:
List of filenames (excluding extension) for all annotated files in the data folder
- filter_dataset(labels: Sequence[str], criterion: str = 'OR') None
Retain only those items with annotations in the list of labels passed.
- Parameters:
labels – List of labels to filter on
criterion – Unused parameter for this type of annotation reader
- group_labels(labels_to_group: List[str], group_name: str) None
Group multiple labels into one. Grouping converts the list of labels into one single label named group_name.
This method does not return anything, but instead overrides the label map for the annotation reader to account for the grouping.
- Parameters:
labels_to_group – List of labels names that should be grouped together
group_name – Name of the resulting label
- get_annotation_stats() Dict[str, Dict[str, int]]
Return the image counts per label in the dataset.
- Returns:
Dictionary containing label names as keys, and as values: - n_images: Number of images containing this label