Datumaro#

Format specification#

Datumaro format is Datumaro’s own data format. It aims to cover all media types and annotation types in Datumaro as possible. Therefore, if you do not want information loss when re-importing your dataset by Datumaro, we recommend exporting your dataset using the Datumaro format. In addition, you can directly use the Datumaro format for the model training using OpenVINO™ Training Extensions.

Supported media types:

  • Image

  • PointCloud

  • VideoFrame

Supported annotation types:

  • Label

  • Mask

  • PolyLine

  • Polygon

  • Bbox

  • Points

  • Caption

  • Cuboid3d

  • Ellipse

Supported annotation attributes:

  • No restrictions

Import Datumaro dataset#

A Datumaro project with a Datumaro source can be created in the following way:

datum project create
datum project import --format datumaro <path/to/dataset>

It is possible to specify project name and project directory. Run datum project create --help for more information.

A Datumaro dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of custom labels (optional)
    ├── images/
    │   ├── <subset_name_1>/
    │   │   ├── <image_name1.ext>
    │   │   ├── <image_name2.ext>
    │   │   └── ...
    │   └── <subset_name_2> /
    │       ├── <image_name1.ext>
    │       ├── <image_name2.ext>
    │       └── ...
    ├── videos/  # directory to store video files
    └── annotations/
        ├── <subset_name_1>.json
        ├── <subset_name_2>.json
        └── ...

If your dataset is not following the above directory structure, it cannot detect and import your dataset as the Datumaro format properly.

To add custom classes, you can use dataset_meta.json.

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Export to other formats#

It can convert Datumaro dataset into any other format Datumaro supports. To get the expected result, convert the dataset to formats that support the specified task (e.g. for panoptic segmentation - VOC, CamVID)

There are several ways to convert a Datumaro dataset to other dataset formats using CLI:

  • Export a dataset from Datumaro format to VOC format:

datum project create
datum project import -f datumaro <path/to/dataset>
datum project export -f voc -o <output/dir>

or

datum convert -if datumaro -i <path/to/dataset> -f voc -o <output/dir>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'datumaro')
dataset.export('save_dir', 'voc', save_media=True)

Export to Datumaro#

There are several ways to convert a dataset to Datumaro format:

  • Export a dataset from an existing project to Datumaro format:

# export dataset into Datumaro format from existing project
datum project export -p <path/to/project> -f datumaro -o <output/dir> \
    -- --save-media
  • Convert a dataset from VOC format to Datumaro format:

# converting to Datumaro format from other format
datum convert -if voc -i <path/to/dataset> \
    -f datumaro -o <output/dir> -- --save-media

Extra options for exporting to Datumaro format:

  • --save-media allow to export dataset with saving media files (by default False)

Examples#

Examples of using this format from the code can be found in the format tests