datumaro.components.algorithms.rise#

Classes

RISE(model[, num_masks, mask_size, prob, ...])

Implements RISE: Randomized Input Sampling for Explanation of Black-box Models algorithm.

class datumaro.components.algorithms.rise.RISE(model, num_masks: int = 100, mask_size: int = 7, prob: float = 0.5, batch_size: int = 1)[source]#

Bases: object

Implements RISE: Randomized Input Sampling for Explanation of Black-box Models algorithm. See explanations at: https://arxiv.org/pdf/1806.07421.pdf

normalize_saliency(saliency)[source]#

generate_masks(image_size)[source]#

generate_masked_dataset(image, image_size, masks)[source]#

apply(image, progressive=False)[source]#

class datumaro.components.algorithms.rise.Dataset(source: IDataset | None = None, *, infos: Dict[str, Any] | None = None, categories: Dict[AnnotationType, Categories] | None = None, media_type: Type[MediaElement] | None = None, ann_types: Set[AnnotationType] | None = None, env: Environment | None = None)[source]#

Bases: IDataset

Represents a dataset, contains metainfo about labels and dataset items. Provides iteration and access options to dataset elements.

By default, all operations are done lazily, it can be changed by modifying the eager property and by using the eager_mode context manager.

Dataset is supposed to have a single media type for its items. If the dataset is filled manually or from extractors, and media type does not match, an error is raised.

classmethod from_iterable(iterable: ~typing.Iterable[~datumaro.components.dataset_base.DatasetItem], infos: ~typing.Dict[str, ~typing.Any] | None = None, categories: ~typing.Dict[~datumaro.components.annotation.AnnotationType, ~datumaro.components.annotation.Categories] | ~typing.List[str] | None = None, *, env: ~datumaro.components.environment.Environment | None = None, media_type: ~typing.Type[~datumaro.components.media.MediaElement] = <class 'datumaro.components.media.Image'>, ann_types: ~typing.Set[~datumaro.components.annotation.AnnotationType] | None = []) → Dataset[source]#

Creates a new dataset from an iterable object producing dataset items - a generator, a list etc. It is a convenient way to create and fill a custom dataset.

Parameters:

iterable – An iterable which returns dataset items
infos – A dictionary of the dataset specific information
categories – A simple list of labels or complete information about labels. If not specified, an empty list of labels is assumed.
media_type – Media type for the dataset items. If the sequence contains items with mismatching media type, an error is raised during caching
env – A context for plugins, which will be used for this dataset. If not specified, the builtin plugins will be used.

Returns:

A new dataset with specified contents

Return type:

dataset

classmethod from_extractors(*sources: IDataset, env: Environment | None = None, merge_policy: str = 'exact') → Dataset[source]#

Creates a new dataset from one or several `Extractor`s.

In case of a single input, creates a lazy wrapper around the input. In case of several inputs, merges them and caches the resulting dataset.

Parameters:

sources – one or many input extractors
env – A context for plugins, which will be used for this dataset. If not specified, the builtin plugins will be used.
merge_policy – Policy on how to merge multiple datasets. Possible options are “exact”, “intersect”, and “union”.

Returns:

A new dataset with contents produced by input extractors

Return type:

dataset

define_infos(infos: Dict[str, Any]) → None[source]#

define_categories(categories: Dict[AnnotationType, Categories]) → None[source]#

init_cache() → None[source]#

get_subset(name) → DatasetSubset[source]#

subsets() → Dict[str, DatasetSubset][source]#: Enumerates subsets in the dataset. Each subset can be a dataset itself.

infos() → Dict[str, Any][source]#: Returns meta-info of dataset.

categories() → Dict[AnnotationType, Categories][source]#: Returns metainfo about dataset labels.

media_type() → Type[MediaElement][source]#

Returns media type of the dataset items.

All the items are supposed to have the same media type. Supposed to be constant and known immediately after the object construction (i.e. doesn’t require dataset iteration).

ann_types() → Set[AnnotationType][source]#: Returns available task type from dataset annotation types.

get(id: str, subset: str | None = None) → DatasetItem | None[source]#: Provides random access to dataset items.

get_annotated_items()[source]#

get_annotations()[source]#

get_datasetitem_by_path(path)[source]#

get_label_cat_names()[source]#

get_subset_info() → str[source]#

get_infos() → Tuple[str][source]#

get_categories_info() → Tuple[str][source]#

put(item: DatasetItem, id: str | None = None, subset: str | None = None) → None[source]#

remove(id: str, subset: str | None = None) → None[source]#

filter(expr: str, *, filter_annotations: bool = False, remove_empty: bool = False) → Dataset[source]#
filter(filter_func: Callable[[DatasetItem], bool] | Callable[[DatasetItem, Annotation], bool], *, filter_annotations: bool = False, remove_empty: bool = False) → Dataset

update(source: DatasetPatch | IDataset | Iterable[DatasetItem]) → Dataset[source]#

Updates items of the current dataset from another dataset or an iterable (the source). Items from the source overwrite matching items in the current dataset. Unmatched items are just appended.

If the source is a DatasetPatch, the removed items in the patch will be removed in the current dataset.

If the source is a dataset, labels are matched. If the labels match, but the order is different, the annotation labels will be remapped to the current dataset label order during updating.

Returns: self

transform(method: str | Type[Transform], **kwargs) → Dataset[source]#

Applies some function to dataset items.

Results are stored in-place. Modifications are applied lazily. Transforms are not allowed to change media type of dataset items.

Parameters:

method – The transformation to be applied to the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the dataset environment.
**kwargs – Parameters for the transformation

Returns: self

run_model(model: Launcher | Type[ModelTransform], *, batch_size: int = 1, append_annotation: bool = False, num_workers: int = 0, **kwargs) → Dataset[source]#

Applies a model to dataset items’ media and produces a dataset with media and annotations.

Parameters:

model – The model to be applied to the dataset
batch_size – The number of dataset items processed simultaneously by the model
append_annotation – Whether append new annotation to existed annotations
num_workers – The number of worker threads to use for parallel inference. Set to 0 for single-process mode. Default is 0.
**kwargs – Parameters for the model

Returns: self

select(pred: Callable[[DatasetItem], bool]) → Dataset[source]#

property data_path: str | None#

property format: str | None#

property options: Dict[str, Any]#

property is_modified: bool#

get_patch() → DatasetPatch[source]#

property env: Environment#

property is_cache_initialized: bool#

property is_eager: bool#

property is_bound: bool#

bind(path: str, format: str | None = None, *, options: Dict[str, Any] | None = None) → None[source]#

Binds the dataset to a speific directory. Allows to set default saving parameters.

The following saves will be done to this directory by default and will use the saved parameters.

flush_changes()[source]#

export(save_dir: str, format: str | Type[Exporter], *, progress_reporter: ProgressReporter | None = None, error_policy: ExportErrorPolicy | None = None, **kwargs) → None[source]#

Saves the dataset in some format.

Parameters:

save_dir – The output directory
format – The desired output format. If a string is passed, it is treated as a plugin name, which is searched for in the dataset environment.
progress_reporter – An object to report progress
error_policy – An object to report format-related errors
**kwargs – Parameters for the format

save(save_dir: str | None = None, **kwargs) → None[source]#

classmethod load(path: str, **kwargs) → Dataset[source]#

classmethod import_from(path: str, format: str | None = None, *, env: Environment | None = None, progress_reporter: ProgressReporter | None = None, error_policy: ImportErrorPolicy | None = None, **kwargs) → Dataset[source]#

Creates a Dataset instance from a dataset on the disk.

Parameters:

path (path - The input file or directory) –
format. (format - Dataset) – If a string is passed, it is treated as a plugin name, which is searched for in the env plugin context. If not set, will try to detect automatically, using the env plugin context.
set (env - A plugin collection. If not) –
used (the built-in plugins are) –
progress. (progress_reporter - An object to report) – Implies earger loading.
errors. (error_policy - An object to report format-related) – Implies earger loading.
format (**kwargs - Parameters for the) –

static detect(path: str, *, env: Environment | None = None, depth: int = 2) → str[source]#

Attempts to detect dataset format of a given directory.

This function tries to detect a single format and fails if it’s not possible. Check Environment.detect_dataset() for a function that reports status for each format checked.

Parameters:

path – The directory to check
depth – The maximum depth for recursive search
env – A plugin collection. If not set, the built-in plugins are used

property is_stream: bool#

Boolean indicating whether the dataset is a stream

If the dataset is a stream, the dataset item is generated on demand from its iterator.

clone() → Dataset[source]#

Create a deep copy of this dataset.

Returns:: A cloned instance of the Dataset.

Bases: object

id: str#

subset: str#

media: MediaElement | None#

annotations: Annotations#

attributes: Dict[str, Any]#

wrap(**kwargs)[source]#

media_as(t: Type[T]) → T[source]#

class datumaro.components.algorithms.rise.Image(size: Tuple[int, int] | None = None, ext: str | None = None, *args, **kwargs)[source]#

Bases: MediaElement[ndarray]

classmethod from_file(path: str, *args, **kwargs)[source]#

classmethod from_numpy(data: ndarray | Callable[[], ndarray], *args, **kwargs)[source]#

classmethod from_bytes(data: bytes | Callable[[], bytes], *args, **kwargs)[source]#

property has_size: bool#: Indicates that size info is cached and won’t require image loading

property size: Tuple[int, int] | None#: Returns (H, W)

property ext: str | None#: Media file extension (with the leading dot)

set_crypter(crypter: Crypter)[source]#

datumaro.components.algorithms.rise.take_by(iterable, count)[source]#: Returns elements from the input iterable by batches of N items. (‘abcdefg’, 3) -> [‘a’, ‘b’, ‘c’], [‘d’, ‘e’, ‘f’], [‘g’]