geti_sdk.annotation_readers

Introduction

The annotation_readers package contains the AnnotationReader base class, which provides an interface for implementing custom annotation readers.

Annotation readers server to load annotation files in custom formats and convert them to Intel® Geti™ format, such that they can be uploaded to an Intel® Geti™ project.

Module contents

class geti_sdk.annotation_readers.base_annotation_reader.AnnotationReader(base_data_folder: str, annotation_format: str = '.json', task_type: TaskType | str = TaskType.DETECTION, anomaly_reduction: bool = False)

Bases: object

Base class for annotation reading, to handle loading and converting annotations to Intel Geti format

abstract get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False) → List[Annotation]: Get annotation data for a certain filename

get_data_filenames() → List[str]

Return a list of annotation files found in the base_data_folder.

Returns:: List of filenames (excluding extension) for all annotation files in the data folder

abstract get_all_label_names() → List[str]: Return a list of unique label names that were found in the annotation data folder belonging to this AnnotationReader instance.

prepare_and_set_dataset(task_type: TaskType | str, previous_task_type: TaskType | None = None) → None

Prepare a dataset for uploading annotations for a certain task_type.

Parameters:

task_type – TaskType to prepare the dataset for
previous_task_type – Optional type of the (trainable) task preceding the current task in the pipeline. This is only used for global tasks

property applied_filters: List[Dict[str, List[str] | str]]

Return a list of dictionaries representing the filter settings that have been applied to the dataset, if any.

Dictionaries in this list contain two keys:

‘labels’ – List of label names which has been filtered on
‘criterion’ – String representing the criterion that has been used in the
filtering. Can be ‘OR’, ‘AND’, ‘XOR’ or ‘NOT’.

Returns:: List of filter settings that have been applied to the dataset. Returns an empty list if no filters have been applied.

class geti_sdk.annotation_readers.geti_annotation_reader.GetiAnnotationReader(base_data_folder: str, annotation_format: str = '.json', task_type: TaskType | str | None = None, label_names_to_include: List[str] | None = None, anomaly_reduction: bool = False)

Bases: AnnotationReader

AnnotationReader for loading annotation files in Intel® Geti™ format.

get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False) → List[Annotation]

Return the annotation data for the dataset item corresponding to filename.

Parameters:

filename – name of the item to get the annotation data for.
label_name_to_id_mapping – mapping of label name to label id.
media_information – MediaInformation object containing information (e.g. width, height) about the media item to upload the annotation for
preserve_shape_for_global_labels –
False to convert shapes for global tasks to full rectangles (required for classification like tasks in Intel® Geti™ projects), True to preserve such shapes. This parameter should be:
- False when uploading annotations to a single task project
- True when uploading annotations for a classification like task,
  following a local task in a task chain project.

Returns:

List of Annotation objects containing all annotations for the given dataset item.

get_all_label_names() → List[str]

Retrieve the unique label names for all annotations in the annotation folder

Returns:: List of label names

class geti_sdk.annotation_readers.datumaro_annotation_reader.datumaro_annotation_reader.DatumAnnotationReader(base_data_folder: str, annotation_format: str, task_type: TaskType | str = TaskType.DETECTION)

Bases: AnnotationReader

Class to read annotations using datumaro

prepare_and_set_dataset(task_type: TaskType | str, previous_task_type: TaskType | None = None) → None

Prepare the dataset for a specific task_type. This could involve for example conversion of annotation shapes.

Parameters:

task_type – TaskType for which to prepare the dataset.
previous_task_type – TaskType preceding the task to prepare the dataset for

convert_labels_to_segmentation_names() → None: Convert the label names in a dataset to ‘*_shape`, where * is the original label name. It can be used to generate unique label names for the segmentation task in a detection_to_segmentation project

get_all_label_names() → List[str]: Retrieve the list of labels names from a datumaro dataset.

property datum_label_map: Dict[int, str]

Returns:: Dictionary mapping the datumaro label id to the label name

override_label_map(new_label_map: Dict[int, str]): Override the label map defined in the datumaro dataset

reset_label_map(): Reset the label map back to the original one from the datumaro dataset.

get_all_image_names() → List[str]: Return a list of image names in the dataset

get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False) → List[Annotation]

Return the annotation data for the dataset item corresponding to filename.

Parameters:

filename – name of the item to get the annotation data for.
label_name_to_id_mapping – mapping of label name to label id.
media_information – MediaInformation object containing information (e.g. width, height) about the media item to upload the annotation for
preserve_shape_for_global_labels –
False to convert shapes for global tasks to full rectangles (required for classification like tasks in Intel® Geti™ projects), True to preserve such shapes. This parameter should be:
- False when uploading annotations to a single task project
- True when uploading annotations for a classification like task,
  following a local task in a task chain project.

Returns:

List of Annotation objects containing all annotations for the given dataset item.

property applied_filters: List[Dict[str, List[str] | str]]: Return a list of filters and their parameters that have been previously applied to the dataset.

filter_dataset(labels: Sequence[str], criterion='OR') → None

Retain only those items with annotations in the list of labels passed.

Param:: labels List of labels to filter on
Param:: criterion Filter criterion, currently “OR” or “AND” are implemented

group_labels(labels_to_group: List[str], group_name: str) → None

Group multiple labels into one. Grouping converts the list of labels into one single label named group_name.

This method does not return anything, but instead overrides the label map for the datamaro dataset to account for the grouping.

Parameters:

labels_to_group – List of labels names that should be grouped together
group_name – Name of the resulting label

Returns:

get_annotation_stats() → Dict[str, Dict[str, int]]

Return the object and image counts per label in the dataset.

Returns:: Dictionary containing label names as keys, and as values: - n_images: Number of images containing this label - n_objects: Number of independent objects with this label

class geti_sdk.annotation_readers.directory_tree_annotation_reader.DirectoryTreeAnnotationReader(base_data_folder: str, subset_folder_names: Sequence[str] | None = None, task_type: TaskType | str = TaskType.CLASSIFICATION)

Bases: AnnotationReader

AnnotationReader for loading single label classification annotations from a dataset organized in a directory tree. This annotation reader expects images to be put in folders, where the name of each image folder corresponds to the label that should be assigned to all images inside it.

Parameters:

base_data_folder – Root of the directory tree that contains the dataset
subset_folder_names – Optional list of subfolders of the base_data_folder that should not be considered as labels, but should be used to acquire the data. For example [‘train’, ‘validation’, ‘test’] for a dataset that is split into three subsets.
task_type – TaskType for the task in the Intel® Geti™ platform to which the annotations should be uploaded

property label_map: Dict[str, str]

Return the label map for the dataset, mapping the root label names (keys) to potential new label names (values). It is used to filter or group the dataset.

If no filters or grouping has been applied, it returns a dictionary with key, value pairs that have identical keys and values, i.e. {“dog”: “dog”}

reset_filters_and_grouping(): Reset the applied filters and grouping, to recover the original dataset

get_data(filename: str, label_name_to_id_mapping: dict, media_information: MediaInformation, preserve_shape_for_global_labels: bool = False, image_name_as_full_path: bool = False) → List[Annotation]

Return the list of annotations for the media item with name filename

Parameters:

filename – Name of the item to return the annotations for
label_name_to_id_mapping – Dictionary mapping the name of a label to its unique database ID
media_information – MediaInformation object containing information (e.g. width, height) about the media item to upload the annotation for
preserve_shape_for_global_labels – Unused parameter in this type of annotation reader
image_name_as_full_path – Set to True if the filename contains the full path to the image

Returns:

A list of Annotation objects for the media item

get_all_label_names() → List[str]: Identify all label names contained in the dataset

get_data_filenames() → List[str]

Return a list of annotated media files found in the dataset.

Returns:: List of filenames (excluding extension) for all annotated files in the data folder

filter_dataset(labels: Sequence[str], criterion: str = 'OR') → None

Retain only those items with annotations in the list of labels passed.

Parameters:

labels – List of labels to filter on
criterion – Unused parameter for this type of annotation reader

group_labels(labels_to_group: List[str], group_name: str) → None

Group multiple labels into one. Grouping converts the list of labels into one single label named group_name.

This method does not return anything, but instead overrides the label map for the annotation reader to account for the grouping.

Parameters:

labels_to_group – List of labels names that should be grouped together
group_name – Name of the resulting label

get_annotation_stats() → Dict[str, Dict[str, int]]

Return the image counts per label in the dataset.

Returns:: Dictionary containing label names as keys, and as values: - n_images: Number of images containing this label