# CIFAR ## Format specification CIFAR format specification is available [here](https://www.cs.toronto.edu/~kriz/cifar.html). Supported annotation types: - `Label` Datumaro supports Python version CIFAR-10/100. The difference between CIFAR-10 and CIFAR-100 is how labels are stored in the meta files (`batches.meta` or `meta`) and in the annotation files. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). In CIFAR-10 there are no superclasses. CIFAR formats contain 32 x 32 images. As an extension, Datumaro supports reading and writing of arbitrary-sized images. ## Import CIFAR dataset The CIFAR dataset is available for free download: - [cifar-10-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz): CIFAR-10 python version - [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz): CIFAR-100 python version A Datumaro project with a CIFAR source can be created in the following way: ``` bash datum project create datum project import --format cifar ``` It is possible to specify project name and project directory. Run `datum project create --help` for more information. CIFAR-10 dataset directory should have the following structure: ``` └─ Dataset/ ├── dataset_meta.json # a list of non-format labels (optional) ├── batches.meta ├── ├── └── ... ``` CIFAR-100 dataset directory should have the following structure: ``` └─ Dataset/ ├── dataset_meta.json # a list of non-format labels (optional) ├── meta ├── ├── └── ... ``` Dataset files use the [Pickle](https://docs.python.org/3/library/pickle.html) data format. Meta files: ``` CIFAR-10: num_cases_per_batch: 1000 label_names: list of strings (['airplane', 'automobile', 'bird', ...]) num_vis: 3072 CIFAR-100: fine_label_names: list of strings (['apple', 'aquarium_fish', ...]) coarse_label_names: list of strings (['aquatic_mammals', 'fish', ...]) ``` Annotation files: ``` Common: 'batch_label': 'training batch 1 of ' 'data': numpy.ndarray of uint8, layout N x C x H x W 'filenames': list of strings If images have non-default size (32x32) (Datumaro extension): 'image_sizes': list of (H, W) tuples CIFAR-10: 'labels': list of strings CIFAR-100: 'fine_labels': list of integers 'coarse_labels': list of integers ``` To add custom classes, you can use [`dataset_meta.json`](/docs/data-formats/formats/index.rst#dataset-meta-info-file). ## Export to other formats Datumaro can convert a CIFAR dataset into any other format [Datumaro supports](/docs/data-formats/formats/index.rst). To get the expected result, convert the dataset to a format that supports the classification task (e.g. MNIST, ImageNet, PascalVOC, etc.) There are several ways to convert a CIFAR dataset to other dataset formats using CLI: ``` bash datum project create datum project import -f cifar datum project export -f imagenet -o ``` or ``` bash datum convert -if cifar -i \ -f imagenet -o -- --save-media ``` Or, using Python API: ```python import datumaro as dm dataset = dm.Dataset.import_from('', 'cifar') dataset.export('save_dir', 'imagenet', save_media=True) ``` ## Export to CIFAR There are several ways to convert a dataset to CIFAR format: ``` bash # export dataset into CIFAR format from existing project datum project export -p -f cifar -o \ -- --save-media ``` ``` bash # converting to CIFAR format from other format datum convert -if imagenet -i \ -f cifar -o -- --save-media ``` Extra options for exporting to CIFAR format: - `--save-media` allow to export dataset with saving media files (by default `False`) - `--image-ext ` allow to specify image extension for exporting the dataset (by default `.png`) - `--save-dataset-meta` - allow to export dataset with saving dataset meta file (by default `False`) The format (CIFAR-10 or CIFAR-100) in which the dataset will be exported depends on the presence of superclasses in the `LabelCategories`. ## Examples Datumaro supports filtering, transformation, merging etc. for all formats and for the CIFAR format in particular. Follow the [user manual](../../user-manual/how_to_use_datumaro) to get more information about these operations. There are several examples of using Datumaro operations to solve particular problems with CIFAR dataset: ### Example 1. How to create a custom CIFAR-like dataset ```python import numpy as np import datumaro as dm dataset = dm.Dataset.from_iterable([ dm.DatasetItem(id=0, image=np.ones((32, 32, 3)), annotations=[dm.Label(3)] ), dm.DatasetItem(id=1, image=np.ones((32, 32, 3)), annotations=[dm.Label(8)] ) ], categories=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']) dataset.export('./dataset', format='cifar') ``` ### Example 2. How to filter and convert a CIFAR dataset to ImageNet Convert a CIFAR dataset to ImageNet format, keep only images with the `dog` class present: ``` bash # Download CIFAR-10 dataset: # https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz datum convert --input-format cifar --input-path \ --output-format imagenet \ --filter '/item[annotation/label="dog"]' ``` Examples of using this format from the code can be found in [the format tests](https://github.com/openvinotoolkit/datumaro/blob/develop/tests/unit/test_cifar_format.py)