# Import and Export Public Semantic Segmentation Data

[![Jupyter Notebook](https://img.shields.io/badge/jupyter-%23FA0F00.svg?style=for-the-badge&logo=jupyter&logoColor=white)](https://github.com/openvinotoolkit/datumaro/blob/develop/notebooks/14_import_export_seg_data.ipynb)

In this notebook, we turn to import and export segmentation data through Datumaro.

## Import Cityscapes data

Cityscapes is one of the most popular benchmarking data in semantic segmentation tasks. 

This provides both fine-grained and coarse ground truths of having pixel-wise classification, i.e., every pixel is annotated into a class between total 20 categories.

You can download this for free [here](https://www.cityscapes-dataset.com/dataset-overview/)!

In [1]:
from datumaro.components.environment import DEFAULT_ENVIRONMENT
from datumaro.components.dataset import Dataset

cityscapes_path = "cityscapes"

detected_formats = DEFAULT_ENVIRONMENT.detect_dataset(cityscapes_path)
print(detected_formats)

cityscapes_dataset = Dataset.import_from(cityscapes_path, detected_formats[0])
print(cityscapes_dataset)

['cityscapes']
Dataset
	size=5000
	source_path=cityscapes
	media_type=<class 'datumaro.components.media.Image'>
	annotated_items_count=5000
	annotations_count=45729
subsets
	test: # of items=1525, # of annotated items=1525, # of annotations=1525, annotation types=['mask']
	train: # of items=2975, # of annotated items=2975, # of annotations=37698, annotation types=['mask']
	val: # of items=500, # of annotated items=500, # of annotations=6506, annotation types=['mask']
infos
	categories
	label: ['road', 'sidewalk', 'building', 'wall', 'fence', 'pole', 'trafficlight', 'trafficsign', 'vegetation', 'terrain', 'sky', 'person', 'rider', 'car', 'truck', 'bus', 'train', 'motorcycle', 'bicycle', 'background']
	mask: []



## Import ADE20K data

We now turn to import another public semantic segmentation data ADE20K.

This is also open to the public for free. Click [here](https://groups.csail.mit.edu/vision/datasets/ADE20K/)!

In [5]:
ade20k_path = "ADE20K_2021_17_01/images/ADE/"

detected_formats = DEFAULT_ENVIRONMENT.detect_dataset(ade20k_path)
print(detected_formats)

ade20k_dataset = Dataset.import_from(ade20k_path, detected_formats[0])
print(ade20k_dataset)

['ade20k2020']
Dataset
	size=27574
	source_path=ADE20K_2021_17_01/images/ADE/
	media_type=<class 'datumaro.components.media.Image'>
	annotated_items_count=27574
	annotations_count=1752558
subsets
	training: # of items=25574, # of annotated items=25574, # of annotations=1576969, annotation types=['polygon', 'mask']
	validation: # of items=2000, # of annotated items=2000, # of annotations=175589, annotation types=['polygon', 'mask']
infos
	categories
	label: ['wall', 'window', 'podium', 'altar', 'ceiling', 'vault', 'column', 'floor', 'bench', 'seats', 'capital', 'shaft', 'base', 'cross', 'staircase', 'railing', 'steps', 'pendant lamp', 'mezzanine', 'door', 'plant', 'candle', 'sculpture', 'pulpit', 'candelabra', 'aquarium', 'plants', 'person', 'head', 'eye', 'mouth', 'right arm', 'right hand', 'left arm', 'left hand', 'right leg', 'right foot', 'left leg', 'left foot', 'sign', 'hair', 'back', 'fence', 'fish', 'rocks', 'text', 'gate', 'people', 'columns', 'console', 'gaze', 'neck', 'shark'

### Export ADE20K data into Cityscapes data format

We now export the imported ADE20K dataset into Cityscapes format as we showed for in the [previous notebook example](https://github.com/openvinotoolkit/datumaro/blob/develop/notebooks/14_import_export_det_data.ipynb).

Because of the large size of entire ADE20K dataset, we are going to export only the validation set into Cityscapes data format below.

In [12]:
ade20k_val_dataset = ade20k_dataset.get_subset("validation").as_dataset()

In [15]:
print("Original Cityscapes data format")
!tree -L 2 ./ADE20K_2021_17_01/images/ADE

save_path = "ade20k_with_cityscapes_format"
ade20k_val_dataset.export(save_path, "cityscapes", save_media=True)

print("Reformulated ADE20K data with Cityscapes format")
!tree -L 2 ./ade20k_with_cityscapes_format

Original Cityscapes data format
[01;34m./ADE20K_2021_17_01/images/ADE[00m
├── [01;34mtraining[00m
│   ├── [01;34mcultural[00m
│   ├── [01;34mhome_or_hotel[00m
│   ├── [01;34mindustrial[00m
│   ├── [01;34mnature_landscape[00m
│   ├── [01;34mshopping_and_dining[00m
│   ├── [01;34msports_and_leisure[00m
│   ├── [01;34mtransportation[00m
│   ├── [01;34munclassified[00m
│   ├── [01;34murban[00m
│   └── [01;34mwork_place[00m
└── [01;34mvalidation[00m
    ├── [01;34mcultural[00m
    ├── [01;34mhome_or_hotel[00m
    ├── [01;34mindustrial[00m
    ├── [01;34mnature_landscape[00m
    ├── [01;34mshopping_and_dining[00m
    ├── [01;34msports_and_leisure[00m
    ├── [01;34mtransportation[00m
    ├── [01;34munclassified[00m
    ├── [01;34murban[00m
    └── [01;34mwork_place[00m

22 directories, 0 files
Reformulated ADE20K data with Cityscapes format
[01;34m./ade20k_with_cityscapes_format[00m
├── [01;34mgtFine[00m
│   └── [01;34mvalidation[00m
├── [