datumaro.plugins.sampler.random_sampler#

Classes

`LabelRandomSampler`(extractor, *[, count, ...])	Sampler that keeps at least the required number of annotations of each class in the dataset for each subset separately.
`RandomSampler`(extractor, count, *[, subset, ...])	Sampler that keeps no more than required number of items in the dataset.

class datumaro.plugins.sampler.random_sampler.RandomSampler(extractor: IDataset, count: int, *, subset: str | None = None, seed: int | None = None)[source]#

Bases: Transform, CliPlugin

Sampler that keeps no more than required number of items in the dataset.

Notes:

Items are selected uniformly
Requesting a sample larger than the number of all images will return all images

Example: select subset of 20 images randomly

random_sampler -k 20 

Example: select subset of 20 images, modify only ‘train’ subset

random_sampler -k 20 -s train

classmethod build_cmdline_parser(**kwargs)[source]#

class datumaro.plugins.sampler.random_sampler.LabelRandomSampler(extractor: IDataset, *, count: int | None = None, label_counts: Mapping[str, int] | None = None, seed: int | None = None)[source]#

Bases: Transform, CliPlugin

Sampler that keeps at least the required number of annotations of each class in the dataset for each subset separately.

Consider using the “stats” command to get class distribution in the dataset.

Notes:

Items can contain annotations of several selected classes (e.g. 3 bounding boxes per image). The number of annotations in the resulting dataset varies between max(class counts) and sum(class counts)
If the input dataset does not has enough class annotations, the result will contain only what is available
Items are selected uniformly
For reasons above, the resulting class distribution in the dataset may not be the same as requested
The resulting dataset will only keep annotations for classes with specified count > 0

Example: select at least 5 annotations of each class randomly

label_random_sampler -k 5 
 

Example: select at least 5 images with “cat” annotations and 3 “person”

label_random_sampler -l "cat:5" -l "person:3"

classmethod build_cmdline_parser(**kwargs)[source]#