Visualize Your Data#

In this notebook example, we’ll take a look at Datumaro visualization Python API. Specifically, we are going to provide the example codes for instance segmentation and captioning tasks with MS-COCO 2017 dataset.

Prerequisite#

Download COCO 2017 validation dataset#

[2]:

!mkdir coco_dataset
!curl http://images.cocodataset.org/zips/val2017.zip --output coco_dataset/val2017.zip --silent
!curl http://images.cocodataset.org/annotations/annotations_trainval2017.zip --output coco_dataset/annotations_trainval2017.zip --silent
!curl http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip --output coco_dataset/panoptic_annotations_trainval2017.zip --silent

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  777M  100  777M    0     0  2997k      0  0:04:25  0:04:25 --:--:-- 7251k0:31  0:00:10  0:20:21  913k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  241M  100  241M    0     0  3601k      0  0:01:08  0:01:08 --:--:-- 4707k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  820M  100  820M    0     0  3661k      0  0:03:49  0:03:49 --:--:-- 4215k

Extract the downloaded dataset#

We extract all the downloaded files and remove train2017 annotation files (e.g., instances_train2017.json) because we only downloaded the val2017 images. After running this cell, you will get the following directory structure.

coco_dataset
├── annotations
│   ├── captions_val2017.json
│   ├── instances_val2017.json
│   ├── panoptic_val2017
│   ├── panoptic_val2017.json
│   └── person_keypoints_val2017.json
├── images
│   └── val2017

[1]:

!unzip -q coco_dataset/val2017.zip -d coco_dataset/images
!unzip -q coco_dataset/annotations_trainval2017.zip -d coco_dataset
!unzip -q coco_dataset/panoptic_annotations_trainval2017.zip -d coco_dataset
!unzip -q coco_dataset/annotations/panoptic_val2017.zip -d coco_dataset/annotations
!find coco_dataset -name "*_train2017.json" | xargs rm

Visualize dataset#

Visualize COCO instance segmentation dataset#

[1]:

# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

import datumaro as dm

dataset = dm.Dataset.import_from("coco_dataset", format="coco_instances")
print("Subset candidates:", dataset.subsets().keys())
subset = list(dataset.subsets().keys())[0]  # val2017
print("Subset:", subset)

WARNING:root:File 'coco_dataset/annotations/panoptic_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/person_keypoints_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/captions_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances

Subset candidates: dict_keys(['val2017'])
Subset: val2017

[2]:

def get_ids(dataset: dm.Dataset, subset: str):
    ids = []
    for item in dataset:
        if item.subset == subset:
            ids += [item.id]

    return ids


ids = get_ids(dataset, subset)
print("DatasetItem ids:", ids[:4])

DatasetItem ids: ['000000397133', '000000037777', '000000252219', '000000087038']

In this cell, we only draw 4 images (ids[:4]) without any annotations (setting alpha=0).

[3]:

visualizer = dm.Visualizer(dataset, figsize=(16, 12), alpha=0)
fig = visualizer.vis_gallery(ids[:4], subset, (None, None))
fig.show()

../../../_images/docs_jupyter_notebook_examples_notebooks_03_visualize_9_0.png

In this time, we’ll draw polygon annotations (instance segmentation task) by setting alpha=0.7. It will automatically infer the grid size of gallery as (2, 2) from the number of input samples.

[4]:

visualizer = dm.Visualizer(dataset, figsize=(16, 12), alpha=0.7)
fig = visualizer.vis_gallery(ids[:4], subset, grid_size=(None, None))
fig.show()

../../../_images/docs_jupyter_notebook_examples_notebooks_03_visualize_11_0.png

In this example, we draw 4 samples in a row. Thus, the gallery has a (1, 4) grid by giving grid_size=(1, 4).

[5]:

visualizer = dm.Visualizer(dataset, figsize=(16, 8), alpha=0.7)
fig = visualizer.vis_gallery(ids[:4], subset, grid_size=(1, 4))
fig.show()

../../../_images/docs_jupyter_notebook_examples_notebooks_03_visualize_13_0.png

It is also possible to visualize a sample one by one.

[6]:

fig = visualizer.vis_one_sample(ids[5], subset)
fig.show()

../../../_images/docs_jupyter_notebook_examples_notebooks_03_visualize_15_0.png

Visualize COCO panoptic segmentation dataset#

[8]:

dataset = dm.Dataset.import_from("coco_dataset", format="coco_panoptic")
ids = get_ids(dataset, subset)
print("DatasetItem ids:", ids[5:9])

visualizer = dm.Visualizer(dataset, figsize=(16, 12), alpha=0.5)
fig = visualizer.vis_gallery(ids[5:9], subset, grid_size=(None, None))
fig.show()

WARNING:root:File 'coco_dataset/annotations/instances_val2017.json' was skipped, could't match this file with any of these tasks: coco_panoptic
WARNING:root:File 'coco_dataset/annotations/person_keypoints_val2017.json' was skipped, could't match this file with any of these tasks: coco_panoptic
WARNING:root:File 'coco_dataset/annotations/captions_val2017.json' was skipped, could't match this file with any of these tasks: coco_panoptic

DatasetItem ids: ['000000403385', '000000006818', '000000480985', '000000458054']

../../../_images/docs_jupyter_notebook_examples_notebooks_03_visualize_17_2.png

Visualize COCO captions dataset#

[9]:

dataset = dm.Dataset.import_from("coco_dataset", format="coco_captions")
ids = get_ids(dataset, subset)
print("DatasetItem ids:", ids[20:24])

visualizer = dm.Visualizer(dataset, figsize=(16, 12), alpha=0.7)
fig = visualizer.vis_gallery(ids[20:24], subset, grid_size=(None, None))
fig.show()

WARNING:root:File 'coco_dataset/annotations/panoptic_val2017.json' was skipped, could't match this file with any of these tasks: coco_captions
WARNING:root:File 'coco_dataset/annotations/instances_val2017.json' was skipped, could't match this file with any of these tasks: coco_captions
WARNING:root:File 'coco_dataset/annotations/person_keypoints_val2017.json' was skipped, could't match this file with any of these tasks: coco_captions

DatasetItem ids: ['000000143931', '000000303818', '000000463730', '000000460347']

../../../_images/docs_jupyter_notebook_examples_notebooks_03_visualize_19_2.png