datumaro.components.importer#

Functions

with_subset_dirs(input_cls)

Classes

Importer()

class datumaro.components.importer.ImportContext(progress_reporter=None, error_policy=None)[source]#

Bases: object

Method generated by attrs for class ImportContext.

progress_reporter: ProgressReporter#

error_policy: ImportErrorPolicy#

class datumaro.components.importer.NullImportContext(progress_reporter=None, error_policy=None)[source]#

Bases: ImportContext

Method generated by attrs for class ImportContext.

progress_reporter: ProgressReporter#

error_policy: ImportErrorPolicy#

class datumaro.components.importer.Importer[source]#

Bases: CliPlugin

DETECT_CONFIDENCE = 10#

classmethod detect(context: FormatDetectionContext) → FormatDetectionConfidence[source]#

classmethod get_file_extensions() → List[str][source]#

classmethod find_sources(path: str) → List[Dict][source]#

classmethod find_sources_with_params(path: str, **extra_params) → List[Dict][source]#

property can_stream: bool#: Flag to indicate whether the importer can stream the dataset item or not.

get_extractor_merger() → Type[ExtractorMerger] | None[source]#

Extractor merger dedicated for the data format

Datumaro import process spawns multiple DatasetBase for the detected sources. We can find a bunch of the detected sources from the given directory path. It is usually each detected source is corresponded to the subset of dataset in many data formats.

Parameters:: stream – There can exist a branch according to stream flag
Returns:: If None, use Dataset.from_extractors() to merge the extractors, Otherwise, use the return type to merge the extractors.

datumaro.components.importer.with_subset_dirs(input_cls: Importer)[source]#

class datumaro.components.importer.ImportErrorPolicy[source]#

Bases: object

report_item_error(error: Exception, *, item_id: Tuple[str, str]) → None[source]#: Allows to report a problem with a dataset item. If this function returns, the extractor must skip the item.

report_annotation_error(error: Exception, *, item_id: Tuple[str, str]) → None[source]#: Allows to report a problem with a dataset item annotation. If this function returns, the extractor must skip the annotation.

fail(error: Exception) → NoReturn[source]#

class datumaro.components.importer.FailingImportErrorPolicy[source]#: Bases: ImportErrorPolicy

class datumaro.components.importer.CliPlugin[source]#

Bases: object

NAME = 'cli_plugin'#

classmethod build_cmdline_parser(**kwargs)[source]#

classmethod parse_cmdline(args=None)[source]#

exception datumaro.components.importer.DatasetImportError[source]#: Bases: DatumaroError

exception datumaro.components.importer.DatasetNotFoundError(path: str, format: str, template: str = "Failed to find dataset '{format}' at '{path}'")[source]#

Bases: DatasetImportError

Method generated by attrs for class DatasetNotFoundError.

path: str#

format: str#

template: str#

class datumaro.components.importer.ExtractorMerger(sources: Sequence[SubsetBase])[source]#

Bases: DatasetBase

A simple class to merge single-subset extractors.

infos() → Dict[str, Any][source]#: Returns meta-info of dataset.

categories() → Dict[AnnotationType, Categories][source]#: Returns metainfo about dataset labels.

get(id: str, subset: str | None = None) → DatasetItem | None[source]#: Provides random access to dataset items.

property is_stream: bool#

Boolean indicating whether the dataset is a stream

If the dataset is a stream, the dataset item is generated on demand from its iterator.

class datumaro.components.importer.FormatDetectionConfidence(value)[source]#

Bases: IntEnum

Represents the level of confidence that a detector has in a dataset belonging to the detector’s format.

NONE = 1#

EXTREME_LOW = 5#

This is currently only assigned to ImageDir format. This is because ImageDir format can be detected in every image dataset format.

Type:: EXTREME_LOW

LOW = 10#

The dataset seems to belong to the format, but the format is too loosely defined to be able to distinguish it from other formats.

Type:: LOW

MEDIUM = 20#

The dataset seems to belong to the format, and is likely not to belong to any other format.

Type:: MEDIUM

class datumaro.components.importer.FormatDetectionContext(root_path: str)[source]#

Bases: object

An instance of this class is given to a dataset format detector. See the FormatDetector documentation. The class should not be instantiated directly.

A context encapsulates information about the dataset whose format is being detected. It also offers methods that place requirements on that dataset. Each such method raises a FormatRequirementsUnmet exception if the requirement is not met. If the requirement _is_ met, the return value depends on the method.

property root_path: str#: Returns the path to the root directory of the dataset. Detectors should avoid using this property in favor of specific requirement methods.

raise_unsupported() → NoReturn[source]#: Raises a FormatDetectionUnsupported exception to signal that the current format does not support detection.

fail(requirement_desc: str) → NoReturn[source]#: Places a requirement that is never met. requirement_desc must contain a human-readable description of the requirement.

require_file(pattern: str, *, exclude_fnames: str | Collection[str] = ()) → str[source]#

Places the requirement that the dataset contains at least one file whose relative path matches the given pattern. The pattern must be a glob-like pattern; ** can be used to indicate a sequence of zero or more subdirectories. If the pattern does not describe a relative path, or refers to files outside the dataset root, the requirement is considered unmet. If the requirement is met, the relative path to one of the files that match the pattern is returned. If there are multiple such files, it’s unspecified which one of them is returned.

exclude_fnames must be a collection of patterns or a single pattern. If at least one pattern is supplied, then the placed requirement is narrowed to only accept files with names that match none of these patterns.

require_files(pattern: str, *, exclude_fnames: str | Collection[str] = ()) → List[str][source]#: Same as require_file, but returns all matching paths in alphabetical order.

require_files_iter(pattern: str, *, exclude_fnames: str | Collection[str] = ()) → Iterator[str][source]#: Same as require_files, but returns a generator.

probe_text_file(path: str, requirement_desc: str, is_binary_file: bool = False) → Iterator[BufferedReader | TextIO][source]#

Returns a context manager that can be used to place a requirement on the contents of the file referred to by path. To do so, you must enter and exit this context manager (typically, by using the with statement). On entering, the file is opened for reading in text mode and the resulting file object is returned. On exiting, the file object is closed.

The requirement that is placed by doing this is considered met if all of the following are true:

path is a relative path that refers to a file within the dataset root.
The file is opened successfully.
The context is exited without an exception.

If the context is exited with an exception that was produced by another requirement being unmet, that exception is reraised and the new requirement is abandoned.

requirement_desc must be a human-readable statement describing the requirement.

require_any() → Iterator[None][source]#

Returns a context manager that can be used to place a requirement that is considered met if at least one of several alternative sets of requirements is met. To do so, use a with statement, with the alternative sets of requirements represented as nested with statements using the context manager returned by alternative:

with context.require_any():
    with context.alternative():
        # place requirements from alternative set 1 here
    with context.alternative():
        # place requirements from alternative set 2 here
    ...

The contents of all with context.alternative() blocks will be executed, even if an alternative that is met is found early.

Requirements must not be placed directly within a with context.require_any() block.

alternative() → Iterator[None][source]#

Returns a context manager that can be used in combination with require_any to define alternative requirements. See the documentation for require_any for more details.

Must only be used directly within a with context.requirements() block.

class datumaro.components.importer.TypeVar(name, *constraints, bound=None, covariant=False, contravariant=False)[source]#

Bases: _Final, _Immutable, _TypeVarLike

Type variable.

Usage:

T = TypeVar('T')  # Can be anything
A = TypeVar('A', str, bytes)  # Must be str or bytes

Type variables exist primarily for the benefit of static type checkers. They serve as the parameters for generic types as well as for generic function definitions. See class Generic for more information on generic types. Generic functions work as follows:

def repeat(x: T, n: int) -> List[T]:
‘’’Return a list containing n references to x.’’’ return [x]*n

def longest(x: A, y: A) -> A:
‘’’Return the longest of two strings.’’’ return x if len(x) >= len(y) else y

The latter example’s signature is essentially the overloading of (str, str) -> str and (bytes, bytes) -> bytes. Also note that if the arguments are instances of some subclass of str, the return type is still plain str.

At runtime, isinstance(x, T) and issubclass(C, T) will raise TypeError.

Type variables defined with covariant=True or contravariant=True can be used to declare covariant or contravariant generic types. See PEP 484 for more details. By default generic types are invariant in all type variables.

Type variables can be introspected. e.g.:

T.__name__ == ‘T’ T.__constraints__ == () T.__covariant__ == False T.__contravariant__ = False A.__constraints__ == (str, bytes)

Note that only type variables defined in global scope can be pickled.

datumaro.components.importer.contextmanager(func)[source]#

@contextmanager decorator.

Typical usage:

@contextmanager def some_generator(<arguments>):

<setup> try:

yield <value>

finally:
<cleanup>

This makes this:

with some_generator(<arguments>) as <variable>:
<body>

equivalent to this:

<setup> try:

<variable> = <value> <body>

finally:
<cleanup>

datumaro.components.importer.iglob(pathname, *, root_dir=None, dir_fd=None, recursive=False)[source]#

Return an iterator which yields the paths matching a pathname pattern.

The pattern may contain simple shell-style wildcards a la fnmatch. However, unlike fnmatch, filenames starting with a dot are special cases that are not matched by ‘*’ and ‘?’ patterns.

If recursive is true, the pattern ‘**’ will match any files and zero or more directories and subdirectories.

datumaro.components.importer.wraps(wrapped, assigned=('__module__', '__name__', '__qualname__', '__doc__', '__annotations__'), updated=('__dict__',))[source]#

Decorator factory to apply update_wrapper() to a wrapper function

Returns a decorator that invokes update_wrapper() with the decorated function as the wrapper argument and the arguments to wraps() as the remaining arguments. Default arguments are as for update_wrapper(). This is a convenience function to simplify applying partial() to update_wrapper().