# Datumaro ## Format specification Datumaro format is [Datumaro](https://github.com/openvinotoolkit/datumaro)'s own data format. It aims to cover all media types and annotation types in Datumaro as possible. Therefore, if you do not want information loss when re-importing your dataset by [Datumaro](https://github.com/openvinotoolkit/datumaro), we recommend exporting your dataset using the Datumaro format. In addition, you can directly use the Datumaro format for the model training using [OpenVINO™ Training Extensions](https://github.com/openvinotoolkit/training_extensions). Supported media types: - `Image` - `PointCloud` - `VideoFrame` Supported annotation types: - `Label` - `Mask` - `PolyLine` - `Polygon` - `Bbox` - `Points` - `Caption` - `Cuboid3d` - `Ellipse` Supported annotation attributes: - No restrictions ## Import Datumaro dataset A Datumaro project with a Datumaro source can be created in the following way: ```console datum project create datum project import --format datumaro ``` It is possible to specify project name and project directory. Run `datum project create --help` for more information. A Datumaro dataset directory should have the following structure: ``` └─ Dataset/ ├── dataset_meta.json # a list of custom labels (optional) ├── images/ │ ├── / │ │ ├── │ │ ├── │ │ └── ... │ └── / │ ├── │ ├── │ └── ... ├── videos/ # directory to store video files │ ├── / │ │ ├── │ │ ├── │ │ └── ... │ └── / │ ├── │ ├── │ └── ... └── annotations/ ├── .json ├── .json └── ... ``` If your dataset is not following the above directory structure, it cannot detect and import your dataset as the Datumaro format properly. To add custom classes, you can use [`dataset_meta.json`](/docs/data-formats/formats/index.rst#dataset-meta-info-file). To make sure that the selected dataset has been added to the project, you can run `datum project info`, which will display the project information. ## Export to other formats It can convert Datumaro dataset into any other format [Datumaro supports](/docs/data-formats/formats/index.rst). To get the expected result, convert the dataset to formats that support the specified task (e.g. for panoptic segmentation - VOC, CamVID) There are several ways to convert a Datumaro dataset to other dataset formats using CLI: - Export a dataset from Datumaro format to VOC format: ```console datum project create datum project import -f datumaro datum project export -f voc -o ``` or ```console datum convert -if datumaro -i -f voc -o ``` Or, using Python API: ```python import datumaro as dm dataset = dm.Dataset.import_from('', 'datumaro') dataset.export('save_dir', 'voc', save_media=True) ``` ## Export to Datumaro There are several ways to convert a dataset to Datumaro format: - Export a dataset from an existing project to Datumaro format: ```console # export dataset into Datumaro format from existing project datum project export -p -f datumaro -o \ -- --save-media ``` - Convert a dataset from VOC format to Datumaro format: ```console # converting to Datumaro format from other format datum convert -if voc -i \ -f datumaro -o -- --save-media ``` Extra options for exporting to Datumaro format: - `--save-media` allow to export dataset with saving media files (by default `False`) ## Examples Examples of using this format from the code can be found in [the format tests](https://github.com/openvinotoolkit/datumaro/tree/develop/tests/unit/data_formats/datumaro/test_datumaro_format.py)