Split(dataset, task, splits[, query, ...])

  • classification split


An enumeration.

class datumaro.plugins.splitter.SplitTask(value)[source]#

Bases: Enum

An enumeration.

classification = 1#
detection = 2#
segmentation = 3#
reid = 4#
class datumaro.plugins.splitter.Split(dataset, task, splits, query=None, attr_for_id=None, seed=None)[source]#

Bases: Transform, CliPlugin

  • classification split

    Splits dataset into subsets(train/val/test) in class-wise manner. Splits dataset images in the specified ratio, keeping the initial class distribution.

  • detection & segmentation split

    Each image can have multiple object annotations - (bbox, mask, polygon). Since an image shouldn’t be included in multiple subsets at the same time, and image annotations shouldn’t be split, in general, dataset annotations are unlikely to be split exactly in the specified ratio. This split tries to split dataset images as close as possible to the specified ratio, keeping the initial class distribution.

  • reidentification split

    In this task, the test set should consist of images of unseen people or objects during the training phase. This function splits a dataset in the following way:

    1. Splits the dataset into ‘train + val’ and ‘test’ sets based on person or object ID.

    2. Splits ‘test’ set into ‘test-gallery’ and ‘test-query’ sets in class-wise manner.

    3. Splits the ‘train + val’ set into ‘train’ and ‘val’ sets in the same way.

The final subsets would be ‘train’, ‘val’, ‘test-gallery’ and ‘test-query’.

  • Each image is expected to have only one Annotation. Unlabeled or multi-labeled images will be split into subsets randomly.

  • If Labels also have attributes, also splits by attribute values.

  • If there is not enough images in some class or attributes group, the split ratio can’t be guaranteed. In reidentification task,

  • Object ID can be described by Label, or by attribute (–attr parameter)

  • The splits of the test set are controlled by ‘–query’ parameter Gallery ratio would be 1.0 - query.


split -t classification --subset train:.5 --subset val:.2 --subset test:.3 

split -t detection --subset train:.5 --subset val:.2 --subset test:.3 

split -t segmentation --subset train:.5 --subset val:.2 --subset test:.3 

split -t reid --subset train:.5 --subset val:.2 --subset test:.3 --query .5 

Example: use ‘person_id’ attribute for splitting

split --attr person_id
classmethod build_cmdline_parser(**kwargs)[source]#

Enumerates subsets in the dataset. Each subset can be a dataset itself.