Level 3: Data Import and Export#
Datumaro is a tool that supports public data formats across a wide range of tasks such as classification, detection, segmentation, pose estimation, or visual tracking. To facilitate this, Datumaro provides assistance with data import and export via both Python API and CLI. This makes it easier for users to work with various data formats using Datumaro.
For the segmentation task, we here introduce the Cityscapes, which collects road scenes from 50 different cities and contains 5K fine-grained pixel-level annotations and 20K coarse annotations. More detailed description is given by here. The Cityscapes dataset is available for free download.
Convert data format#
Users sometimes need to compare, merge, or manage various kinds of public datasets in a unified
system. To achieve this, Datumaro not only has
export funcionalities, but also
convert, which shortens the import and export into a single command line.
Let’s convert the Cityscapes data into the MS-COCO format, which is described in here.
Without creation of a project, we can achieve this with a single line command
convert in Datumaro
datum convert -if cityscapes -i <path/to/cityscapes> -f coco_panoptic -o <path/to/output>
With Python API, we can import the data through
Dataset as below.
from datumaro.components.dataset import Dataset data_path = '/path/to/cityscapes' data_format = 'cityscapes' dataset = Dataset.import_from(data_path, data_format)
We then export the import dataset as
output_path = '/path/to/output' dataset.export(output_path, format='coco_panoptic')
With the project-based CLI, we first require to
create a project by
datum project create -o <path/to/project>
import Cityscapes data into the project through
datum project import --format cityscapes -p <path/to/project> <path/to/cityscapes>
(Optional) When we import a data, the change is automatically commited in the project.
This can be shown through
datum project log -p <path/to/project>
(Optional) We can check the imported dataset information such as subsets, number of data, or
datum project info -p <path/to/project>
export the data within the project with MS-COCO format as
datum project export --format coco -p <path/to/project> -o <path/to/save> -- --save-media
Even if you are not sure about the format of the dataset, there’s no need to worry. You can easily detect the format in the next level, which is described in the next level!