There are several options available:

Standalone tool#

Datuaro as a standalone tool allows to do various dataset operations from the command line interface:

datum --help
python -m datumaro --help

Python module#

Datumaro can be used in custom scripts as a Python module. Used this way, it allows to use its features from an existing codebase, enabling dataset reading, exporting and iteration capabilities, simplifying integration of custom formats and providing high performance operations:

import datumaro as dm

dataset = dm.Dataset.import_from('path/', 'voc')

# keep only annotated images item: len(item.annotations) != 0)

# change dataset labels and corresponding annotations
      'cat': 'dog', # rename cat to dog
      'truck': 'car', # rename truck to car
      'person': '', # remove this label
    default='delete') # remove everything else

# iterate over the dataset elements
for item in dataset:
    print(, item.annotations)

# export the resulting dataset in COCO format
dataset.export('dst/dir', 'coco', save_images=True)