Format specification#

The original CelebA dataset is available here.

Supported annotation types:

  • Label

  • Bbox

  • Points (landmarks)

Supported attributes:

  • 5_o_Clock_Shadow, Arched_Eyebrows, Attractive, Bags_Under_Eyes, Bald, Bangs, Big_Lips, Big_Nose, Black_Hair, Blond_Hair, Blurry, Brown_Hair, Bushy_Eyebrows, Chubby, Double_Chin, Eyeglasses, Goatee, Gray_Hair, Heavy_Makeup, High_Cheekbones, Male, Mouth_Slightly_Open, Mustache, Narrow_Eyes, No_Beard, Oval_Face, Pale_Skin, Pointy_Nose, Receding_Hairline, Rosy_Cheeks, Sideburns, Smiling, Straight_Hair, Wavy_Hair, Wearing_Earrings, Wearing_Hat, Wearing_Lipstick, Wearing_Necklace, Wearing_Necktie, Young (boolean)

Import CelebA dataset#

A Datumaro project with a CelebA source can be created in the following way:

datum project create
datum project import --format celeba <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

celeba_dataset = dm.Dataset.import_from('<path/to/dataset>', 'celeba')

CelebA dataset directory should have the following structure:

├── dataset_meta.json # a list of non-format labels (optional)
├── Anno/
│   ├── identity_CelebA.txt
│   ├── list_attr_celeba.txt
│   ├── list_bbox_celeba.txt
│   └── list_landmarks_celeba.txt
├── Eval/
│   └── list_eval_partition.txt
└── Img/
    └── img_celeba/
        ├── 000001.jpg
        ├── 000002.jpg
        └── ...

The identity_CelebA.txt file contains labels (required). The list_attr_celeba.txt, list_bbox_celeba.txt, list_landmarks_celeba.txt, list_eval_partition.txt files contain attributes, bounding boxes, landmarks and subsets respectively (optional).

The original CelebA dataset stores images in a .7z archive. The archive needs to be unpacked before importing.

To add custom classes, you can use dataset_meta.json.

Export to other formats#

Datumaro can convert a CelebA dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports labels, bounding boxes or landmarks.

There are several ways to convert a CelebA dataset to other dataset formats using CLI:

datum project create
datum project import -f celeba <path/to/dataset>
datum project export -f imagenet_txt -o ./save_dir -- --save-media


datum convert -if celeba -i <path/to/dataset> \
    -f imagenet_txt -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'celeba')
dataset.export('save_dir', 'voc')


Examples of using this format from the code can be found in the format tests