datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer#

Classes

`LossDynamicsAnalyzer`(dataset[, alpha, ...])	A class for analyzing the dynamics of training loss to identify noisy labels.
`NoisyLabelCandidate`(id, subset, ann_id, ...)

class datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.LossDynamicsAnalyzer(dataset: IDataset, alpha: float = 0.001, tracking_loss_type: str | None = None)[source]#

Bases: object

A class for analyzing the dynamics of training loss to identify noisy labels.

This class parses the dataset to extract information about the training loss dynamics. It then calculates the exponential moving average (EMA) of the training loss dynamics. A higher EMA value of training loss dynamics can indicate a noisy labeled sample [1]. The class provides an interface to extract the top-k candidates for noisy labels based on the statistics. Additionally, it can plot the EMA curves of loss dynamics for the candidates, allowing comparison of the dataset’s overall average or averages grouped by labels.

“Robust curriculum learning: from clean label detection to noisy label self-correction.” International Conference on Learning Representations. 2021.

allowed_task_names = {'OTX-Det', 'OTX-MultiClassCls'}#

allowed_tracking_loss_types = {'bbox', 'bbox_refine', 'centerness', 'cls'}#

property alpha: float#

A parameter to obtain EMA loss dynamics statistics.

ema_loss_dyns(t) := (1 - alpha) * ema_loss_dyns(t - 1) + alpha * loss_dyns(t)

property mean_loss_dyns: Series#: Pandas Series object obtained by averaging all EMA loss dynamics statistics

property mean_loss_dyns_per_label: Dict[LabelCategories.Category, Series]#: A dictionary of Pandas Series object obtained by averaging EMA loss dynamics statistics according to the label category

property ema_dataframe: DataFrame#: Pandas DataFrame including full EMA loss dynamics statistics.

get_top_k_cands(top_k: int) → List[NoisyLabelCandidate][source]#

Return a list of top-k noisy label candidates.

Parameters:

top_k (int) – An integer value to determine the number of candidates

Returns:

A list of top-k noisy label candidates.
It is sorted in descending order by the last value of the EMA training loss dynamics.

plot_ema_loss_dynamics(candidates: Sequence[NoisyLabelCandidate], mode: str = 'mean', mean_plot_style: str = '--', mean_plot_color: str = 'k', figsize: Tuple[int, int] = (4, 3), **kwargs) → Figure[source]#

class datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.NoisyLabelCandidate(id: 'str', subset: 'str', ann_id: 'int', label_id: 'int', metric: 'float')[source]#

Bases: object

id: str#

subset: str#

ann_id: int#

label_id: int#

metric: float#

class datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.AnnotationType(value)[source]#

Bases: IntEnum

An enumeration.

unknown = 0#

label = 1#

mask = 2#

points = 3#

polygon = 4#

polyline = 5#

bbox = 6#

caption = 7#

cuboid_3d = 8#

super_resolution_annotation = 9#

depth_annotation = 10#

ellipse = 11#

hash_key = 12#

feature_vector = 13#

tabular = 14#

rotated_bbox = 15#

exception datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.DatasetError[source]#: Bases: DatumaroError

class datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.IDataset[source]#

Bases: object

subsets() → Dict[str, IDataset][source]#: Enumerates subsets in the dataset. Each subset can be a dataset itself.

get_subset(name) → IDataset[source]#

infos() → Dict[str, Any][source]#: Returns meta-info of dataset.

categories() → Dict[AnnotationType, Categories][source]#: Returns metainfo about dataset labels.

get(id: str, subset: str | None = None) → DatasetItem | None[source]#: Provides random access to dataset items.

media_type() → Type[MediaElement][source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

task_type() → TaskType[source]#: Returns available task type from dataset annotation types.

property is_stream: bool#

Boolean indicating whether the dataset is a stream

If the dataset is a stream, the dataset item is generated on demand from its iterator.

class datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.LabelCategories(items: List[str] = _Nothing.NOTHING, label_groups: List[LabelGroup] = _Nothing.NOTHING, *, attributes: Set[str] = _Nothing.NOTHING)[source]#

Bases: Categories

Method generated by attrs for class LabelCategories.

class Category(name, parent: str = '', attributes: Set[str] = _Nothing.NOTHING)[source]#

Bases: object

Method generated by attrs for class LabelCategories.Category.

name: str#

parent: str#

attributes: Set[str]#

class LabelGroup(name, labels: List[str] = [], group_type: GroupType = GroupType.EXCLUSIVE)[source]#

Bases: object

Method generated by attrs for class LabelCategories.LabelGroup.

name: str#

labels: List[str]#

group_type: GroupType#

items: List[str]#

label_groups: List[LabelGroup]#

classmethod from_iterable(iterable: Iterable[str | Tuple[str] | Tuple[str, str] | Tuple[str, str, List[str]]]) → LabelCategories[source]#

Creates a LabelCategories from iterable.

Parameters:

iterable –

This iterable object can be:

a list of str - will be interpreted as list of Category names
a list of positional arguments - will generate Categories with these arguments

Returns: a LabelCategories object

add(name: str, parent: str | None = None, attributes: Set[str] | None = None) → int[source]#

add_label_group(name: str, labels: List[str], group_type: GroupType) → int[source]#

find(name: str) → Tuple[int | None, Category | None][source]#

datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False)[source]#

Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.

Examines PEP 526 __annotations__ to determine fields.

If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method function is added. If frozen is true, fields may not be assigned to after instance creation. If match_args is true, the __match_args__ tuple is added. If kw_only is true, then by default all fields are keyword-only. If slots is true, an __slots__ attribute is added.

class datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.defaultdict#

Bases: dict

defaultdict(default_factory=None, /, […]) –> dict with default factory

The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.

copy() → a shallow copy of D.#

default_factory#: Factory for default value called by __missing__().

datumaro.components.algorithms.noisy_label_detection.loss_dynamics_analyzer.field(*, default=<dataclasses._MISSING_TYPE object>, default_factory=<dataclasses._MISSING_TYPE object>, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=<dataclasses._MISSING_TYPE object>)[source]#

Return an object to identify dataclass fields.

default is the default value of the field. default_factory is a 0-argument function called to initialize a field’s value. If init is true, the field will be a parameter to the class’s __init__() function. If repr is true, the field will be included in the object’s repr(). If hash is true, the field will be included in the object’s hash(). If compare is true, the field will be used in comparison functions. metadata, if specified, must be a mapping which is stored but not otherwise examined by dataclass. If kw_only is true, the field will become a keyword-only parameter to __init__().

It is an error to specify both default and default_factory.