Format specification#

LFW (Labeled Faces in the Wild Home) it’s dataset for face identification task, specification for this format is available here. You can also download original LFW dataset here.

Original dataset contains images with people faces. For each image contains information about person’s name, as well as information about images that matched with this person and mismatched with this person. Also LFW contains additional information about landmark points on the face.

Supported annotation types:

  • Label

  • Points (face landmark points)

Supported attributes:

  • negative_pairs: list with names of mismatched persons;

  • positive_pairs: list with names of matched persons;

Import LFW dataset#

Importing LFW dataset into the Datumaro project:

datum project create
datum project import -f lfw <path_to_lfw_dataset>

See more information about adding datasets to the project in the docs.

Also you can import LFW dataset from Python API:

import datumaro as dm

lfw_dataset = dm.Dataset.import_from('<path_to_lfw_dataset>', 'lfw')

For successful importing the LFW dataset, the directory with it should has the following structure:

├── subset_1
│    ├── annotations
│    │   ├── landmarks.txt # list with landmark points for each image
│    │   ├── pairs.txt # list of matched and mismatched pairs of person
│    │   └── people.txt # optional file with a list of persons name
│    └── images
│        ├── name0
│        │   ├── name0_0001.jpg
│        │   ├── name0_0002.jpg
│        │   ├── ...
│        ├── name1
│        │   ├── name1_0001.jpg
│        │   ├── name1_0002.jpg
│        │   ├── ...
├── subset_2
│    ├── ...
├── ...

Full description of annotation *.txt files available here.

Export LFW dataset#

With Datumaro you can convert LFW dataset into any other format Datumaro supports. Pay attention that this format should also support Label and/or Points annotation types.

There is few ways to convert LFW dataset into other format:

# Converting to ImageNet with `convert` command:
datum convert -if lfw -i ./lfw_dataset \
    -f imagenet -o ./output_dir -- --save-media

# Converting to VggFace2 through the Datumaro project:
datum project create
datum project add -f lfw ./lfw_dataset
datum project export -f vgg_face2 -o ./output_dir2

Note: some formats have extra export options. For particular format see the docs to get information about it.

Export dataset to the LFW format#

With Datumaro you can export dataset that has Label or/and Points annotations to the LFW format, example:

# Converting VGG Face2 dataset into the LFW format
datum convert -if vgg_face2 -i ./voc_dataset \
    -f lfw -o ./output_dir

# Export dataaset to the LFW format through the Datumaro project:
datum project create
datum project import -f voc_classification ../vgg_dataset
datum project export -f lfw -o ./output_dir -- --save-media --image-ext png

Available extra export options for LFW dataset format:

  • --save-media allow to export dataset with saving media files (by default False)

  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original)