Transform Dataset: Re-id, Reindexing, Remapping, etc.#

Jupyter Notebook

In this notebook example, we will take a look at Datumaro transform api, where transform provides splitting and merging subsets, redefining annotation information, reidentifying media, and task-changing with the modification of the annotation format, e.g., from masks to polygons, from bounding boxes to masks, from shapes to bounding boxes, etc.

Prerequisite#

Download COCO 2017 validation dataset#

Please refer openvinotoolkit/datumaro to prepare COCO 2017 validation dataset.

[2]:
# Copyright (C) 2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

import datumaro as dm

dataset = dm.Dataset.import_from("coco_dataset", format="coco_instances")

print("Representation for sample COCO dataset")
dataset
WARNING:root:File 'coco_dataset/annotations/panoptic_train2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/panoptic_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/person_keypoints_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/captions_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/person_keypoints_train2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/captions_train2017.json' was skipped, could't match this file with any of these tasks: coco_instances
Representation for sample COCO dataset
[2]:
Dataset
        size=123287
        source_path=coco_dataset
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=122218
        annotations_count=1018861
subsets
        train2017: # of items=118287, # of annotated items=117266, # of annotations=976995, annotation types=['mask', 'polygon']
        val2017: # of items=5000, # of annotated items=4952, # of annotations=41866, annotation types=['mask', 'polygon']
categories
        label: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Transform media ID#

We first modify the media_id through transformation. The original media_id are given by below.

[3]:
subsets = list(dataset.subsets().keys())
print("Subset candidates:", subsets)


def get_ids(dataset: dm.Dataset, subset: str):
    ids = []
    for item in dataset:
        if item.subset == subset:
            ids += [item.id]

    return ids


get_ids(dataset, subsets[0])
Subset candidates: ['val2017', 'train2017']
[3]:
['000000397133',
 '000000037777',
 '000000252219',
 '000000087038',
 '000000174482',
 '000000403385',
 '000000006818',
 '000000480985',
 '000000458054',
 '000000331352',
 '000000296649',
 '000000386912',
 '000000502136',
 '000000491497',
 '000000184791',
 '000000348881',
 '000000289393',
 '000000522713',
 '000000181666',
 '000000017627',
 '000000143931',
 '000000303818',
 '000000463730',
 '000000460347',
 '000000322864',
 '000000226111',
 '000000153299',
 '000000308394',
 '000000456496',
 '000000058636',
 '000000041888',
 '000000184321',
 '000000565778',
 '000000297343',
 '000000336587',
 '000000122745',
 '000000219578',
 '000000555705',
 '000000443303',
 '000000500663',
 '000000418281',
 '000000025560',
 '000000403817',
 '000000085329',
 '000000329323',
 '000000239274',
 '000000286994',
 '000000511321',
 '000000314294',
 '000000233771',
 '000000475779',
 '000000301867',
 '000000312421',
 '000000185250',
 '000000356427',
 '000000572517',
 '000000270244',
 '000000516316',
 '000000125211',
 '000000562121',
 '000000360661',
 '000000016228',
 '000000382088',
 '000000266409',
 '000000430961',
 '000000080671',
 '000000577539',
 '000000104612',
 '000000476258',
 '000000448365',
 '000000035197',
 '000000349860',
 '000000180135',
 '000000486438',
 '000000400573',
 '000000109798',
 '000000370677',
 '000000238866',
 '000000369370',
 '000000502737',
 '000000515579',
 '000000515445',
 '000000173383',
 '000000438862',
 '000000180560',
 '000000347693',
 '000000039956',
 '000000321214',
 '000000474028',
 '000000066523',
 '000000355257',
 '000000142092',
 '000000063154',
 '000000199551',
 '000000239347',
 '000000514508',
 '000000473237',
 '000000228144',
 '000000206027',
 '000000078915',
 '000000551215',
 '000000544519',
 '000000096493',
 '000000023899',
 '000000340175',
 '000000578500',
 '000000366141',
 '000000057597',
 '000000559842',
 '000000434230',
 '000000428454',
 '000000399462',
 '000000261061',
 '000000168330',
 '000000383384',
 '000000342006',
 '000000217285',
 '000000236412',
 '000000524456',
 '000000153343',
 '000000095786',
 '000000326541',
 '000000213086',
 '000000231339',
 '000000508730',
 '000000550426',
 '000000368294',
 '000000171190',
 '000000301135',
 '000000580294',
 '000000494869',
 '000000033638',
 '000000329219',
 '000000034873',
 '000000186980',
 '000000127182',
 '000000356387',
 '000000367680',
 '000000263796',
 '000000117425',
 '000000365387',
 '000000487583',
 '000000504711',
 '000000363840',
 '000000214720',
 '000000379453',
 '000000311295',
 '000000029393',
 '000000278848',
 '000000166391',
 '000000048153',
 '000000459153',
 '000000295713',
 '000000223130',
 '000000273132',
 '000000198960',
 '000000344059',
 '000000410428',
 '000000087875',
 '000000450758',
 '000000458790',
 '000000460160',
 '000000458109',
 '000000030675',
 '000000566524',
 '000000338428',
 '000000545826',
 '000000166277',
 '000000269314',
 '000000476415',
 '000000292082',
 '000000360137',
 '000000122046',
 '000000352684',
 '000000512836',
 '000000008021',
 '000000107226',
 '000000084477',
 '000000562243',
 '000000181859',
 '000000177015',
 '000000292236',
 '000000121506',
 '000000288042',
 '000000453860',
 '000000500257',
 '000000113403',
 '000000125062',
 '000000375015',
 '000000334719',
 '000000134112',
 '000000283520',
 '000000031269',
 '000000319721',
 '000000165351',
 '000000347265',
 '000000414170',
 '000000231508',
 '000000389381',
 '000000118921',
 '000000021503',
 '000000000785',
 '000000300842',
 '000000105014',
 '000000261982',
 '000000034205',
 '000000099242',
 '000000314709',
 '000000460494',
 '000000339442',
 '000000541055',
 '000000409475',
 '000000464786',
 '000000378605',
 '000000331817',
 '000000218091',
 '000000578545',
 '000000363207',
 '000000372577',
 '000000212166',
 '000000172571',
 '000000294831',
 '000000084431',
 '000000323355',
 '000000355325',
 '000000100582',
 '000000555412',
 '000000004495',
 '000000009483',
 '000000326082',
 '000000398237',
 '000000507223',
 '000000031050',
 '000000239537',
 '000000340930',
 '000000011813',
 '000000281414',
 '000000537991',
 '000000284282',
 '000000321333',
 '000000521282',
 '000000108026',
 '000000243204',
 '000000177935',
 '000000038829',
 '000000397327',
 '000000501523',
 '000000555050',
 '000000376442',
 '000000187243',
 '000000356347',
 '000000293044',
 '000000560279',
 '000000042276',
 '000000534827',
 '000000190756',
 '000000482917',
 '000000300659',
 '000000199977',
 '000000442480',
 '000000384350',
 '000000383621',
 '000000189828',
 '000000412894',
 '000000537153',
 '000000361103',
 '000000392722',
 '000000338560',
 '000000264535',
 '000000295231',
 '000000154947',
 '000000212559',
 '000000458755',
 '000000104782',
 '000000315257',
 '000000130599',
 '000000227187',
 '000000151662',
 '000000461275',
 '000000523811',
 '000000456559',
 '000000101068',
 '000000140640',
 '000000516708',
 '000000544605',
 '000000385190',
 '000000338986',
 '000000053994',
 '000000061171',
 '000000314034',
 '000000291490',
 '000000152740',
 '000000024919',
 '000000079837',
 '000000021903',
 '000000564133',
 '000000337055',
 '000000110638',
 '000000034139',
 '000000080340',
 '000000083113',
 '000000173033',
 '000000255664',
 '000000072813',
 '000000545129',
 '000000546011',
 '000000121031',
 '000000172547',
 '000000369081',
 '000000509131',
 '000000578922',
 '000000464089',
 '000000453708',
 '000000177714',
 '000000459887',
 '000000155179',
 '000000261116',
 '000000396274',
 '000000029640',
 '000000141328',
 '000000308430',
 '000000043314',
 '000000273715',
 '000000456303',
 '000000406611',
 '000000475064',
 '000000466567',
 '000000137246',
 '000000015079',
 '000000296284',
 '000000226147',
 '000000226903',
 '000000127517',
 '000000162092',
 '000000131379',
 '000000366611',
 '000000263969',
 '000000551439',
 '000000474167',
 '000000159458',
 '000000554735',
 '000000099428',
 '000000386352',
 '000000173004',
 '000000311394',
 '000000578489',
 '000000189310',
 '000000491366',
 '000000448076',
 '000000293804',
 '000000312237',
 '000000221291',
 '000000141821',
 '000000410650',
 '000000199310',
 '000000323151',
 '000000089648',
 '000000219283',
 '000000471869',
 '000000520264',
 '000000111179',
 '000000151000',
 '000000100624',
 '000000332570',
 '000000057238',
 '000000502732',
 '000000135561',
 '000000008277',
 '000000173044',
 '000000168458',
 '000000512194',
 '000000370042',
 '000000189436',
 '000000533958',
 '000000117645',
 '000000221708',
 '000000202228',
 '000000403565',
 '000000211042',
 '000000492878',
 '000000441586',
 '000000547816',
 '000000306733',
 '000000530099',
 '000000312278',
 '000000097679',
 '000000564127',
 '000000251065',
 '000000003845',
 '000000138819',
 '000000205834',
 '000000348708',
 '000000166521',
 '000000485802',
 '000000099054',
 '000000022969',
 '000000570539',
 '000000278353',
 '000000158548',
 '000000461405',
 '000000176606',
 '000000044699',
 '000000559956',
 '000000268996',
 '000000011197',
 '000000483667',
 '000000448810',
 '000000000724',
 '000000051961',
 '000000375278',
 '000000302165',
 '000000131131',
 '000000098839',
 '000000402992',
 '000000465675',
 '000000240754',
 '000000021167',
 '000000148730',
 '000000384468',
 '000000253742',
 '000000186873',
 '000000082180',
 '000000446522',
 '000000552902',
 '000000125405',
 '000000110211',
 '000000016010',
 '000000064462',
 '000000314182',
 '000000248980',
 '000000068387',
 '000000429281',
 '000000345466',
 '000000352900',
 '000000118367',
 '000000113235',
 '000000311303',
 '000000163640',
 '000000370999',
 '000000001490',
 '000000329456',
 '000000570471',
 '000000088269',
 '000000260470',
 '000000193494',
 '000000252776',
 '000000201072',
 '000000018150',
 '000000337498',
 '000000521405',
 '000000518770',
 '000000201646',
 '000000036936',
 '000000059044',
 '000000172946',
 '000000234607',
 '000000532690',
 '000000323895',
 '000000384670',
 '000000050326',
 '000000205542',
 '000000217957',
 '000000162035',
 '000000415727',
 '000000046252',
 '000000182021',
 '000000231747',
 '000000090284',
 '000000286553',
 '000000488736',
 '000000063602',
 '000000383386',
 '000000450686',
 '000000005060',
 '000000286523',
 '000000120420',
 '000000579655',
 '000000117908',
 '000000550322',
 '000000322844',
 '000000218362',
 '000000213224',
 '000000223747',
 '000000297578',
 '000000458992',
 '000000078266',
 '000000164602',
 '000000440475',
 '000000101762',
 '000000557501',
 '000000203317',
 '000000368940',
 '000000569917',
 '000000144798',
 '000000284623',
 '000000520301',
 '000000127987',
 '000000063740',
 '000000036494',
 '000000210032',
 '000000488270',
 '000000067180',
 '000000281179',
 '000000064359',
 '000000126226',
 '000000190923',
 '000000150265',
 '000000216739',
 '000000038048',
 '000000354829',
 '000000525155',
 '000000163314',
 '000000259571',
 '000000561679',
 '000000236166',
 '000000153529',
 '000000473015',
 '000000379800',
 '000000253835',
 '000000034071',
 '000000036861',
 '000000569565',
 '000000219271',
 '000000205647',
 '000000460841',
 '000000123131',
 '000000334006',
 '000000511599',
 '000000229858',
 '000000174004',
 '000000519764',
 '000000137576',
 '000000087470',
 '000000009769',
 '000000558114',
 '000000205776',
 '000000163257',
 '000000475678',
 '000000085478',
 '000000318080',
 '000000361551',
 '000000236784',
 '000000092839',
 '000000042296',
 '000000560266',
 '000000486479',
 '000000127955',
 '000000307658',
 '000000417465',
 '000000342971',
 '000000011760',
 '000000069106',
 '000000070158',
 '000000176634',
 '000000281447',
 '000000552371',
 '000000361919',
 '000000560256',
 '000000138115',
 '000000114871',
 '000000374369',
 '000000123213',
 '000000123321',
 '000000015278',
 '000000357742',
 '000000439854',
 '000000465836',
 '000000414385',
 '000000131556',
 '000000322724',
 '000000320664',
 '000000481390',
 '000000109916',
 '000000276434',
 '000000579635',
 '000000295316',
 '000000571313',
 '000000183127',
 '000000115898',
 '000000146358',
 '000000329542',
 '000000189752',
 '000000290163',
 '000000091406',
 '000000322352',
 '000000223959',
 '000000326248',
 '000000218439',
 '000000453722',
 '000000293625',
 '000000411817',
 '000000546964',
 '000000215259',
 '000000573094',
 '000000560011',
 '000000038576',
 '000000147729',
 '000000579307',
 '000000154425',
 '000000432898',
 '000000404923',
 '000000130586',
 '000000163057',
 '000000007511',
 '000000067406',
 '000000290179',
 '000000248752',
 '000000054593',
 '000000116208',
 '000000340697',
 '000000450303',
 '000000494427',
 '000000137294',
 '000000410880',
 '000000311180',
 '000000091654',
 '000000181796',
 '000000002431',
 '000000349184',
 '000000298396',
 '000000472046',
 '000000074058',
 '000000058029',
 '000000134096',
 '000000111951',
 '000000103585',
 '000000210273',
 '000000352584',
 '000000446651',
 '000000194875',
 '000000052017',
 '000000336309',
 '000000227478',
 '000000339870',
 '000000080666',
 '000000033707',
 '000000327601',
 '000000255749',
 '000000008762',
 '000000526392',
 '000000535578',
 '000000580757',
 '000000165039',
 '000000148719',
 '000000108440',
 '000000489842',
 '000000579818',
 '000000423229',
 '000000323828',
 '000000166287',
 '000000101420',
 '000000334555',
 '000000196759',
 '000000411665',
 '000000061418',
 '000000526751',
 '000000024021',
 '000000277020',
 '000000047828',
 '000000183716',
 '000000271997',
 '000000008532',
 '000000094336',
 '000000390555',
 '000000250282',
 '000000068409',
 '000000002299',
 '000000011051',
 '000000066038',
 '000000360960',
 '000000360097',
 '000000421455',
 '000000504589',
 '000000464522',
 '000000454750',
 '000000509735',
 '000000023034',
 '000000141671',
 '000000506656',
 '000000272566',
 '000000045728',
 '000000424551',
 '000000341719',
 '000000072795',
 '000000078959',
 '000000417285',
 '000000002157',
 '000000043816',
 '000000455555',
 '000000535306',
 '000000030504',
 '000000093353',
 '000000530052',
 '000000473118',
 '000000091779',
 '000000283113',
 '000000226130',
 '000000097278',
 '000000567640',
 '000000532493',
 '000000045550',
 '000000156643',
 '000000430056',
 '000000410456',
 '000000441286',
 '000000279541',
 '000000000885',
 '000000378284',
 '000000156076',
 '000000143572',
 '000000229849',
 '000000039551',
 '000000056344',
 '000000193348',
 '000000016958',
 '000000572678',
 '000000106235',
 '000000341681',
 '000000083172',
 '000000343524',
 '000000395801',
 '000000388056',
 '000000259690',
 '000000235836',
 '000000343218',
 '000000205105',
 '000000513283',
 '000000176446',
 '000000371677',
 '000000308531',
 '000000497599',
 '000000455352',
 '000000236914',
 '000000232684',
 '000000415238',
 '000000290843',
 '000000519522',
 '000000144784',
 '000000167486',
 '000000392228',
 '000000488673',
 '000000191013',
 '000000080057',
 '000000570169',
 '000000224807',
 '000000163562',
 '000000136355',
 '000000492362',
 '000000102707',
 '000000232563',
 '000000010977',
 '000000051598',
 '000000032285',
 '000000520910',
 '000000131273',
 '000000206411',
 '000000472375',
 '000000481404',
 '000000471991',
 '000000017436',
 '000000177934',
 '000000165518',
 '000000571718',
 '000000459467',
 '000000135673',
 '000000134886',
 '000000485895',
 '000000287545',
 '000000577182',
 '000000289222',
 '000000372819',
 '000000310072',
 '000000087144',
 '000000430875',
 '000000060347',
 '000000042070',
 '000000420916',
 '000000453584',
 '000000296224',
 '000000122606',
 '000000311909',
 '000000579893',
 '000000284296',
 '000000221017',
 '000000315001',
 '000000439715',
 '000000284991',
 '000000389566',
 '000000078843',
 '000000122927',
 '000000225532',
 '000000013659',
 '000000153568',
 '000000395633',
 '000000419096',
 '000000203488',
 '000000361268',
 '000000466125',
 '000000414795',
 '000000508101',
 '000000253386',
 '000000222991',
 '000000530854',
 '000000351810',
 '000000338624',
 '000000138492',
 '000000263463',
 '000000226592',
 '000000378454',
 '000000020059',
 '000000227686',
 '000000476215',
 '000000297698',
 '000000247917',
 '000000439522',
 '000000479448',
 '000000424721',
 '000000026690',
 '000000558854',
 '000000176901',
 '000000334767',
 '000000301563',
 '000000086755',
 '000000194471',
 '000000420281',
 '000000533206',
 '000000099810',
 '000000334483',
 '000000089670',
 '000000482275',
 '000000404805',
 '000000002261',
 '000000425702',
 '000000036844',
 '000000012576',
 '000000361238',
 '000000108253',
 '000000319935',
 '000000003934',
 '000000029596',
 '000000047740',
 '000000077460',
 '000000014439',
 '000000571893',
 '000000447314',
 '000000181303',
 '000000058350',
 '000000026465',
 '000000246968',
 '000000536947',
 '000000076731',
 '000000286182',
 '000000433980',
 '000000561366',
 '000000380913',
 '000000032887',
 '000000517687',
 '000000213035',
 '000000399205',
 '000000349837',
 '000000350002',
 '000000131431',
 '000000356248',
 '000000334399',
 '000000057150',
 '000000363666',
 '000000507235',
 '000000169996',
 '000000226417',
 '000000481573',
 '000000056127',
 '000000123480',
 '000000274687',
 '000000164637',
 '000000178028',
 '000000493286',
 '000000348216',
 '000000345027',
 '000000571804',
 '000000140658',
 '000000102644',
 '000000581615',
 '000000279887',
 '000000230008',
 '000000284698',
 '000000102356',
 '000000456394',
 '000000323709',
 '000000452122',
 '000000579158',
 '000000525322',
 '000000033114',
 '000000008690',
 '000000381639',
 '000000217614',
 '000000284445',
 '000000468124',
 '000000187144',
 '000000273198',
 '000000095843',
 '000000417779',
 '000000447342',
 '000000166563',
 '000000490125',
 '000000561009',
 '000000183675',
 '000000290248',
 '000000532058',
 '000000214200',
 '000000578093',
 '000000369751',
 '000000429011',
 '000000301061',
 '000000105264',
 '000000267434',
 '000000370711',
 '000000025393',
 '000000471087',
 '000000106757',
 '000000183648',
 '000000358525',
 '000000049269',
 '000000079144',
 '000000519688',
 '000000431727',
 '000000130699',
 '000000215245',
 '000000091921',
 '000000218424',
 '000000473974',
 '000000405249',
 '000000235784',
 '000000521540',
 '000000537506',
 '000000119445',
 '000000507015',
 '000000173830',
 '000000356498',
 '000000435081',
 '000000018575',
 '000000373315',
 '000000227765',
 '000000013546',
 '000000067310',
 '000000125936',
 '000000389109',
 '000000322211',
 '000000184384',
 '000000426329',
 '000000128476',
 '000000414034',
 '000000450488',
 '000000099182',
 '000000051738',
 '000000099039',
 '000000075456',
 '000000134882',
 '000000442323',
 '000000232489',
 '000000351823',
 '000000065736',
 '000000001000',
 '000000379842',
 '000000013923',
 '000000559543',
 '000000185890',
 '000000357978',
 '000000129492',
 '000000261097',
 '000000410510',
 '000000039951',
 '000000306700',
 '000000146457',
 '000000214224',
 '000000332845',
 '000000255483',
 '000000222455',
 '000000187271',
 '000000462629',
 '000000544565',
 '000000369771',
 '000000035963',
 '000000289516',
 '000000334309',
 '000000452084',
 '000000301718',
 '000000429598',
 '000000165257',
 '000000093437',
 '000000413552',
 '000000062025',
 '000000017379',
 '000000176778',
 '000000104572',
 '000000090108',
 '000000157124',
 '000000089556',
 '000000266206',
 '000000086220',
 '000000508602',
 ...]

We here adopt reindex transformation to make media_id be incrementing from start.

[4]:
reindexing_dataset = dataset.transform("reindex", start=0)
get_ids(reindexing_dataset, subsets[0])
[4]:
['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '46',
 '47',
 '48',
 '49',
 '50',
 '51',
 '52',
 '53',
 '54',
 '55',
 '56',
 '57',
 '58',
 '59',
 '60',
 '61',
 '62',
 '63',
 '64',
 '65',
 '66',
 '67',
 '68',
 '69',
 '70',
 '71',
 '72',
 '73',
 '74',
 '75',
 '76',
 '77',
 '78',
 '79',
 '80',
 '81',
 '82',
 '83',
 '84',
 '85',
 '86',
 '87',
 '88',
 '89',
 '90',
 '91',
 '92',
 '93',
 '94',
 '95',
 '96',
 '97',
 '98',
 '99',
 '100',
 '101',
 '102',
 '103',
 '104',
 '105',
 '106',
 '107',
 '108',
 '109',
 '110',
 '111',
 '112',
 '113',
 '114',
 '115',
 '116',
 '117',
 '118',
 '119',
 '120',
 '121',
 '122',
 '123',
 '124',
 '125',
 '126',
 '127',
 '128',
 '129',
 '130',
 '131',
 '132',
 '133',
 '134',
 '135',
 '136',
 '137',
 '138',
 '139',
 '140',
 '141',
 '142',
 '143',
 '144',
 '145',
 '146',
 '147',
 '148',
 '149',
 '150',
 '151',
 '152',
 '153',
 '154',
 '155',
 '156',
 '157',
 '158',
 '159',
 '160',
 '161',
 '162',
 '163',
 '164',
 '165',
 '166',
 '167',
 '168',
 '169',
 '170',
 '171',
 '172',
 '173',
 '174',
 '175',
 '176',
 '177',
 '178',
 '179',
 '180',
 '181',
 '182',
 '183',
 '184',
 '185',
 '186',
 '187',
 '188',
 '189',
 '190',
 '191',
 '192',
 '193',
 '194',
 '195',
 '196',
 '197',
 '198',
 '199',
 '200',
 '201',
 '202',
 '203',
 '204',
 '205',
 '206',
 '207',
 '208',
 '209',
 '210',
 '211',
 '212',
 '213',
 '214',
 '215',
 '216',
 '217',
 '218',
 '219',
 '220',
 '221',
 '222',
 '223',
 '224',
 '225',
 '226',
 '227',
 '228',
 '229',
 '230',
 '231',
 '232',
 '233',
 '234',
 '235',
 '236',
 '237',
 '238',
 '239',
 '240',
 '241',
 '242',
 '243',
 '244',
 '245',
 '246',
 '247',
 '248',
 '249',
 '250',
 '251',
 '252',
 '253',
 '254',
 '255',
 '256',
 '257',
 '258',
 '259',
 '260',
 '261',
 '262',
 '263',
 '264',
 '265',
 '266',
 '267',
 '268',
 '269',
 '270',
 '271',
 '272',
 '273',
 '274',
 '275',
 '276',
 '277',
 '278',
 '279',
 '280',
 '281',
 '282',
 '283',
 '284',
 '285',
 '286',
 '287',
 '288',
 '289',
 '290',
 '291',
 '292',
 '293',
 '294',
 '295',
 '296',
 '297',
 '298',
 '299',
 '300',
 '301',
 '302',
 '303',
 '304',
 '305',
 '306',
 '307',
 '308',
 '309',
 '310',
 '311',
 '312',
 '313',
 '314',
 '315',
 '316',
 '317',
 '318',
 '319',
 '320',
 '321',
 '322',
 '323',
 '324',
 '325',
 '326',
 '327',
 '328',
 '329',
 '330',
 '331',
 '332',
 '333',
 '334',
 '335',
 '336',
 '337',
 '338',
 '339',
 '340',
 '341',
 '342',
 '343',
 '344',
 '345',
 '346',
 '347',
 '348',
 '349',
 '350',
 '351',
 '352',
 '353',
 '354',
 '355',
 '356',
 '357',
 '358',
 '359',
 '360',
 '361',
 '362',
 '363',
 '364',
 '365',
 '366',
 '367',
 '368',
 '369',
 '370',
 '371',
 '372',
 '373',
 '374',
 '375',
 '376',
 '377',
 '378',
 '379',
 '380',
 '381',
 '382',
 '383',
 '384',
 '385',
 '386',
 '387',
 '388',
 '389',
 '390',
 '391',
 '392',
 '393',
 '394',
 '395',
 '396',
 '397',
 '398',
 '399',
 '400',
 '401',
 '402',
 '403',
 '404',
 '405',
 '406',
 '407',
 '408',
 '409',
 '410',
 '411',
 '412',
 '413',
 '414',
 '415',
 '416',
 '417',
 '418',
 '419',
 '420',
 '421',
 '422',
 '423',
 '424',
 '425',
 '426',
 '427',
 '428',
 '429',
 '430',
 '431',
 '432',
 '433',
 '434',
 '435',
 '436',
 '437',
 '438',
 '439',
 '440',
 '441',
 '442',
 '443',
 '444',
 '445',
 '446',
 '447',
 '448',
 '449',
 '450',
 '451',
 '452',
 '453',
 '454',
 '455',
 '456',
 '457',
 '458',
 '459',
 '460',
 '461',
 '462',
 '463',
 '464',
 '465',
 '466',
 '467',
 '468',
 '469',
 '470',
 '471',
 '472',
 '473',
 '474',
 '475',
 '476',
 '477',
 '478',
 '479',
 '480',
 '481',
 '482',
 '483',
 '484',
 '485',
 '486',
 '487',
 '488',
 '489',
 '490',
 '491',
 '492',
 '493',
 '494',
 '495',
 '496',
 '497',
 '498',
 '499',
 '500',
 '501',
 '502',
 '503',
 '504',
 '505',
 '506',
 '507',
 '508',
 '509',
 '510',
 '511',
 '512',
 '513',
 '514',
 '515',
 '516',
 '517',
 '518',
 '519',
 '520',
 '521',
 '522',
 '523',
 '524',
 '525',
 '526',
 '527',
 '528',
 '529',
 '530',
 '531',
 '532',
 '533',
 '534',
 '535',
 '536',
 '537',
 '538',
 '539',
 '540',
 '541',
 '542',
 '543',
 '544',
 '545',
 '546',
 '547',
 '548',
 '549',
 '550',
 '551',
 '552',
 '553',
 '554',
 '555',
 '556',
 '557',
 '558',
 '559',
 '560',
 '561',
 '562',
 '563',
 '564',
 '565',
 '566',
 '567',
 '568',
 '569',
 '570',
 '571',
 '572',
 '573',
 '574',
 '575',
 '576',
 '577',
 '578',
 '579',
 '580',
 '581',
 '582',
 '583',
 '584',
 '585',
 '586',
 '587',
 '588',
 '589',
 '590',
 '591',
 '592',
 '593',
 '594',
 '595',
 '596',
 '597',
 '598',
 '599',
 '600',
 '601',
 '602',
 '603',
 '604',
 '605',
 '606',
 '607',
 '608',
 '609',
 '610',
 '611',
 '612',
 '613',
 '614',
 '615',
 '616',
 '617',
 '618',
 '619',
 '620',
 '621',
 '622',
 '623',
 '624',
 '625',
 '626',
 '627',
 '628',
 '629',
 '630',
 '631',
 '632',
 '633',
 '634',
 '635',
 '636',
 '637',
 '638',
 '639',
 '640',
 '641',
 '642',
 '643',
 '644',
 '645',
 '646',
 '647',
 '648',
 '649',
 '650',
 '651',
 '652',
 '653',
 '654',
 '655',
 '656',
 '657',
 '658',
 '659',
 '660',
 '661',
 '662',
 '663',
 '664',
 '665',
 '666',
 '667',
 '668',
 '669',
 '670',
 '671',
 '672',
 '673',
 '674',
 '675',
 '676',
 '677',
 '678',
 '679',
 '680',
 '681',
 '682',
 '683',
 '684',
 '685',
 '686',
 '687',
 '688',
 '689',
 '690',
 '691',
 '692',
 '693',
 '694',
 '695',
 '696',
 '697',
 '698',
 '699',
 '700',
 '701',
 '702',
 '703',
 '704',
 '705',
 '706',
 '707',
 '708',
 '709',
 '710',
 '711',
 '712',
 '713',
 '714',
 '715',
 '716',
 '717',
 '718',
 '719',
 '720',
 '721',
 '722',
 '723',
 '724',
 '725',
 '726',
 '727',
 '728',
 '729',
 '730',
 '731',
 '732',
 '733',
 '734',
 '735',
 '736',
 '737',
 '738',
 '739',
 '740',
 '741',
 '742',
 '743',
 '744',
 '745',
 '746',
 '747',
 '748',
 '749',
 '750',
 '751',
 '752',
 '753',
 '754',
 '755',
 '756',
 '757',
 '758',
 '759',
 '760',
 '761',
 '762',
 '763',
 '764',
 '765',
 '766',
 '767',
 '768',
 '769',
 '770',
 '771',
 '772',
 '773',
 '774',
 '775',
 '776',
 '777',
 '778',
 '779',
 '780',
 '781',
 '782',
 '783',
 '784',
 '785',
 '786',
 '787',
 '788',
 '789',
 '790',
 '791',
 '792',
 '793',
 '794',
 '795',
 '796',
 '797',
 '798',
 '799',
 '800',
 '801',
 '802',
 '803',
 '804',
 '805',
 '806',
 '807',
 '808',
 '809',
 '810',
 '811',
 '812',
 '813',
 '814',
 '815',
 '816',
 '817',
 '818',
 '819',
 '820',
 '821',
 '822',
 '823',
 '824',
 '825',
 '826',
 '827',
 '828',
 '829',
 '830',
 '831',
 '832',
 '833',
 '834',
 '835',
 '836',
 '837',
 '838',
 '839',
 '840',
 '841',
 '842',
 '843',
 '844',
 '845',
 '846',
 '847',
 '848',
 '849',
 '850',
 '851',
 '852',
 '853',
 '854',
 '855',
 '856',
 '857',
 '858',
 '859',
 '860',
 '861',
 '862',
 '863',
 '864',
 '865',
 '866',
 '867',
 '868',
 '869',
 '870',
 '871',
 '872',
 '873',
 '874',
 '875',
 '876',
 '877',
 '878',
 '879',
 '880',
 '881',
 '882',
 '883',
 '884',
 '885',
 '886',
 '887',
 '888',
 '889',
 '890',
 '891',
 '892',
 '893',
 '894',
 '895',
 '896',
 '897',
 '898',
 '899',
 '900',
 '901',
 '902',
 '903',
 '904',
 '905',
 '906',
 '907',
 '908',
 '909',
 '910',
 '911',
 '912',
 '913',
 '914',
 '915',
 '916',
 '917',
 '918',
 '919',
 '920',
 '921',
 '922',
 '923',
 '924',
 '925',
 '926',
 '927',
 '928',
 '929',
 '930',
 '931',
 '932',
 '933',
 '934',
 '935',
 '936',
 '937',
 '938',
 '939',
 '940',
 '941',
 '942',
 '943',
 '944',
 '945',
 '946',
 '947',
 '948',
 '949',
 '950',
 '951',
 '952',
 '953',
 '954',
 '955',
 '956',
 '957',
 '958',
 '959',
 '960',
 '961',
 '962',
 '963',
 '964',
 '965',
 '966',
 '967',
 '968',
 '969',
 '970',
 '971',
 '972',
 '973',
 '974',
 '975',
 '976',
 '977',
 '978',
 '979',
 '980',
 '981',
 '982',
 '983',
 '984',
 '985',
 '986',
 '987',
 '988',
 '989',
 '990',
 '991',
 '992',
 '993',
 '994',
 '995',
 '996',
 '997',
 '998',
 '999',
 ...]

By adopting id_from_image_name, we can rollback the media_id to be the media name.

[5]:
rollback_dataset = dataset.transform("id_from_image_name")
get_ids(rollback_dataset, subsets[0])
[5]:
['000000397133',
 '000000037777',
 '000000252219',
 '000000087038',
 '000000174482',
 '000000403385',
 '000000006818',
 '000000480985',
 '000000458054',
 '000000331352',
 '000000296649',
 '000000386912',
 '000000502136',
 '000000491497',
 '000000184791',
 '000000348881',
 '000000289393',
 '000000522713',
 '000000181666',
 '000000017627',
 '000000143931',
 '000000303818',
 '000000463730',
 '000000460347',
 '000000322864',
 '000000226111',
 '000000153299',
 '000000308394',
 '000000456496',
 '000000058636',
 '000000041888',
 '000000184321',
 '000000565778',
 '000000297343',
 '000000336587',
 '000000122745',
 '000000219578',
 '000000555705',
 '000000443303',
 '000000500663',
 '000000418281',
 '000000025560',
 '000000403817',
 '000000085329',
 '000000329323',
 '000000239274',
 '000000286994',
 '000000511321',
 '000000314294',
 '000000233771',
 '000000475779',
 '000000301867',
 '000000312421',
 '000000185250',
 '000000356427',
 '000000572517',
 '000000270244',
 '000000516316',
 '000000125211',
 '000000562121',
 '000000360661',
 '000000016228',
 '000000382088',
 '000000266409',
 '000000430961',
 '000000080671',
 '000000577539',
 '000000104612',
 '000000476258',
 '000000448365',
 '000000035197',
 '000000349860',
 '000000180135',
 '000000486438',
 '000000400573',
 '000000109798',
 '000000370677',
 '000000238866',
 '000000369370',
 '000000502737',
 '000000515579',
 '000000515445',
 '000000173383',
 '000000438862',
 '000000180560',
 '000000347693',
 '000000039956',
 '000000321214',
 '000000474028',
 '000000066523',
 '000000355257',
 '000000142092',
 '000000063154',
 '000000199551',
 '000000239347',
 '000000514508',
 '000000473237',
 '000000228144',
 '000000206027',
 '000000078915',
 '000000551215',
 '000000544519',
 '000000096493',
 '000000023899',
 '000000340175',
 '000000578500',
 '000000366141',
 '000000057597',
 '000000559842',
 '000000434230',
 '000000428454',
 '000000399462',
 '000000261061',
 '000000168330',
 '000000383384',
 '000000342006',
 '000000217285',
 '000000236412',
 '000000524456',
 '000000153343',
 '000000095786',
 '000000326541',
 '000000213086',
 '000000231339',
 '000000508730',
 '000000550426',
 '000000368294',
 '000000171190',
 '000000301135',
 '000000580294',
 '000000494869',
 '000000033638',
 '000000329219',
 '000000034873',
 '000000186980',
 '000000127182',
 '000000356387',
 '000000367680',
 '000000263796',
 '000000117425',
 '000000365387',
 '000000487583',
 '000000504711',
 '000000363840',
 '000000214720',
 '000000379453',
 '000000311295',
 '000000029393',
 '000000278848',
 '000000166391',
 '000000048153',
 '000000459153',
 '000000295713',
 '000000223130',
 '000000273132',
 '000000198960',
 '000000344059',
 '000000410428',
 '000000087875',
 '000000450758',
 '000000458790',
 '000000460160',
 '000000458109',
 '000000030675',
 '000000566524',
 '000000338428',
 '000000545826',
 '000000166277',
 '000000269314',
 '000000476415',
 '000000292082',
 '000000360137',
 '000000122046',
 '000000352684',
 '000000512836',
 '000000008021',
 '000000107226',
 '000000084477',
 '000000562243',
 '000000181859',
 '000000177015',
 '000000292236',
 '000000121506',
 '000000288042',
 '000000453860',
 '000000500257',
 '000000113403',
 '000000125062',
 '000000375015',
 '000000334719',
 '000000134112',
 '000000283520',
 '000000031269',
 '000000319721',
 '000000165351',
 '000000347265',
 '000000414170',
 '000000231508',
 '000000389381',
 '000000118921',
 '000000021503',
 '000000000785',
 '000000300842',
 '000000105014',
 '000000261982',
 '000000034205',
 '000000099242',
 '000000314709',
 '000000460494',
 '000000339442',
 '000000541055',
 '000000409475',
 '000000464786',
 '000000378605',
 '000000331817',
 '000000218091',
 '000000578545',
 '000000363207',
 '000000372577',
 '000000212166',
 '000000172571',
 '000000294831',
 '000000084431',
 '000000323355',
 '000000355325',
 '000000100582',
 '000000555412',
 '000000004495',
 '000000009483',
 '000000326082',
 '000000398237',
 '000000507223',
 '000000031050',
 '000000239537',
 '000000340930',
 '000000011813',
 '000000281414',
 '000000537991',
 '000000284282',
 '000000321333',
 '000000521282',
 '000000108026',
 '000000243204',
 '000000177935',
 '000000038829',
 '000000397327',
 '000000501523',
 '000000555050',
 '000000376442',
 '000000187243',
 '000000356347',
 '000000293044',
 '000000560279',
 '000000042276',
 '000000534827',
 '000000190756',
 '000000482917',
 '000000300659',
 '000000199977',
 '000000442480',
 '000000384350',
 '000000383621',
 '000000189828',
 '000000412894',
 '000000537153',
 '000000361103',
 '000000392722',
 '000000338560',
 '000000264535',
 '000000295231',
 '000000154947',
 '000000212559',
 '000000458755',
 '000000104782',
 '000000315257',
 '000000130599',
 '000000227187',
 '000000151662',
 '000000461275',
 '000000523811',
 '000000456559',
 '000000101068',
 '000000140640',
 '000000516708',
 '000000544605',
 '000000385190',
 '000000338986',
 '000000053994',
 '000000061171',
 '000000314034',
 '000000291490',
 '000000152740',
 '000000024919',
 '000000079837',
 '000000021903',
 '000000564133',
 '000000337055',
 '000000110638',
 '000000034139',
 '000000080340',
 '000000083113',
 '000000173033',
 '000000255664',
 '000000072813',
 '000000545129',
 '000000546011',
 '000000121031',
 '000000172547',
 '000000369081',
 '000000509131',
 '000000578922',
 '000000464089',
 '000000453708',
 '000000177714',
 '000000459887',
 '000000155179',
 '000000261116',
 '000000396274',
 '000000029640',
 '000000141328',
 '000000308430',
 '000000043314',
 '000000273715',
 '000000456303',
 '000000406611',
 '000000475064',
 '000000466567',
 '000000137246',
 '000000015079',
 '000000296284',
 '000000226147',
 '000000226903',
 '000000127517',
 '000000162092',
 '000000131379',
 '000000366611',
 '000000263969',
 '000000551439',
 '000000474167',
 '000000159458',
 '000000554735',
 '000000099428',
 '000000386352',
 '000000173004',
 '000000311394',
 '000000578489',
 '000000189310',
 '000000491366',
 '000000448076',
 '000000293804',
 '000000312237',
 '000000221291',
 '000000141821',
 '000000410650',
 '000000199310',
 '000000323151',
 '000000089648',
 '000000219283',
 '000000471869',
 '000000520264',
 '000000111179',
 '000000151000',
 '000000100624',
 '000000332570',
 '000000057238',
 '000000502732',
 '000000135561',
 '000000008277',
 '000000173044',
 '000000168458',
 '000000512194',
 '000000370042',
 '000000189436',
 '000000533958',
 '000000117645',
 '000000221708',
 '000000202228',
 '000000403565',
 '000000211042',
 '000000492878',
 '000000441586',
 '000000547816',
 '000000306733',
 '000000530099',
 '000000312278',
 '000000097679',
 '000000564127',
 '000000251065',
 '000000003845',
 '000000138819',
 '000000205834',
 '000000348708',
 '000000166521',
 '000000485802',
 '000000099054',
 '000000022969',
 '000000570539',
 '000000278353',
 '000000158548',
 '000000461405',
 '000000176606',
 '000000044699',
 '000000559956',
 '000000268996',
 '000000011197',
 '000000483667',
 '000000448810',
 '000000000724',
 '000000051961',
 '000000375278',
 '000000302165',
 '000000131131',
 '000000098839',
 '000000402992',
 '000000465675',
 '000000240754',
 '000000021167',
 '000000148730',
 '000000384468',
 '000000253742',
 '000000186873',
 '000000082180',
 '000000446522',
 '000000552902',
 '000000125405',
 '000000110211',
 '000000016010',
 '000000064462',
 '000000314182',
 '000000248980',
 '000000068387',
 '000000429281',
 '000000345466',
 '000000352900',
 '000000118367',
 '000000113235',
 '000000311303',
 '000000163640',
 '000000370999',
 '000000001490',
 '000000329456',
 '000000570471',
 '000000088269',
 '000000260470',
 '000000193494',
 '000000252776',
 '000000201072',
 '000000018150',
 '000000337498',
 '000000521405',
 '000000518770',
 '000000201646',
 '000000036936',
 '000000059044',
 '000000172946',
 '000000234607',
 '000000532690',
 '000000323895',
 '000000384670',
 '000000050326',
 '000000205542',
 '000000217957',
 '000000162035',
 '000000415727',
 '000000046252',
 '000000182021',
 '000000231747',
 '000000090284',
 '000000286553',
 '000000488736',
 '000000063602',
 '000000383386',
 '000000450686',
 '000000005060',
 '000000286523',
 '000000120420',
 '000000579655',
 '000000117908',
 '000000550322',
 '000000322844',
 '000000218362',
 '000000213224',
 '000000223747',
 '000000297578',
 '000000458992',
 '000000078266',
 '000000164602',
 '000000440475',
 '000000101762',
 '000000557501',
 '000000203317',
 '000000368940',
 '000000569917',
 '000000144798',
 '000000284623',
 '000000520301',
 '000000127987',
 '000000063740',
 '000000036494',
 '000000210032',
 '000000488270',
 '000000067180',
 '000000281179',
 '000000064359',
 '000000126226',
 '000000190923',
 '000000150265',
 '000000216739',
 '000000038048',
 '000000354829',
 '000000525155',
 '000000163314',
 '000000259571',
 '000000561679',
 '000000236166',
 '000000153529',
 '000000473015',
 '000000379800',
 '000000253835',
 '000000034071',
 '000000036861',
 '000000569565',
 '000000219271',
 '000000205647',
 '000000460841',
 '000000123131',
 '000000334006',
 '000000511599',
 '000000229858',
 '000000174004',
 '000000519764',
 '000000137576',
 '000000087470',
 '000000009769',
 '000000558114',
 '000000205776',
 '000000163257',
 '000000475678',
 '000000085478',
 '000000318080',
 '000000361551',
 '000000236784',
 '000000092839',
 '000000042296',
 '000000560266',
 '000000486479',
 '000000127955',
 '000000307658',
 '000000417465',
 '000000342971',
 '000000011760',
 '000000069106',
 '000000070158',
 '000000176634',
 '000000281447',
 '000000552371',
 '000000361919',
 '000000560256',
 '000000138115',
 '000000114871',
 '000000374369',
 '000000123213',
 '000000123321',
 '000000015278',
 '000000357742',
 '000000439854',
 '000000465836',
 '000000414385',
 '000000131556',
 '000000322724',
 '000000320664',
 '000000481390',
 '000000109916',
 '000000276434',
 '000000579635',
 '000000295316',
 '000000571313',
 '000000183127',
 '000000115898',
 '000000146358',
 '000000329542',
 '000000189752',
 '000000290163',
 '000000091406',
 '000000322352',
 '000000223959',
 '000000326248',
 '000000218439',
 '000000453722',
 '000000293625',
 '000000411817',
 '000000546964',
 '000000215259',
 '000000573094',
 '000000560011',
 '000000038576',
 '000000147729',
 '000000579307',
 '000000154425',
 '000000432898',
 '000000404923',
 '000000130586',
 '000000163057',
 '000000007511',
 '000000067406',
 '000000290179',
 '000000248752',
 '000000054593',
 '000000116208',
 '000000340697',
 '000000450303',
 '000000494427',
 '000000137294',
 '000000410880',
 '000000311180',
 '000000091654',
 '000000181796',
 '000000002431',
 '000000349184',
 '000000298396',
 '000000472046',
 '000000074058',
 '000000058029',
 '000000134096',
 '000000111951',
 '000000103585',
 '000000210273',
 '000000352584',
 '000000446651',
 '000000194875',
 '000000052017',
 '000000336309',
 '000000227478',
 '000000339870',
 '000000080666',
 '000000033707',
 '000000327601',
 '000000255749',
 '000000008762',
 '000000526392',
 '000000535578',
 '000000580757',
 '000000165039',
 '000000148719',
 '000000108440',
 '000000489842',
 '000000579818',
 '000000423229',
 '000000323828',
 '000000166287',
 '000000101420',
 '000000334555',
 '000000196759',
 '000000411665',
 '000000061418',
 '000000526751',
 '000000024021',
 '000000277020',
 '000000047828',
 '000000183716',
 '000000271997',
 '000000008532',
 '000000094336',
 '000000390555',
 '000000250282',
 '000000068409',
 '000000002299',
 '000000011051',
 '000000066038',
 '000000360960',
 '000000360097',
 '000000421455',
 '000000504589',
 '000000464522',
 '000000454750',
 '000000509735',
 '000000023034',
 '000000141671',
 '000000506656',
 '000000272566',
 '000000045728',
 '000000424551',
 '000000341719',
 '000000072795',
 '000000078959',
 '000000417285',
 '000000002157',
 '000000043816',
 '000000455555',
 '000000535306',
 '000000030504',
 '000000093353',
 '000000530052',
 '000000473118',
 '000000091779',
 '000000283113',
 '000000226130',
 '000000097278',
 '000000567640',
 '000000532493',
 '000000045550',
 '000000156643',
 '000000430056',
 '000000410456',
 '000000441286',
 '000000279541',
 '000000000885',
 '000000378284',
 '000000156076',
 '000000143572',
 '000000229849',
 '000000039551',
 '000000056344',
 '000000193348',
 '000000016958',
 '000000572678',
 '000000106235',
 '000000341681',
 '000000083172',
 '000000343524',
 '000000395801',
 '000000388056',
 '000000259690',
 '000000235836',
 '000000343218',
 '000000205105',
 '000000513283',
 '000000176446',
 '000000371677',
 '000000308531',
 '000000497599',
 '000000455352',
 '000000236914',
 '000000232684',
 '000000415238',
 '000000290843',
 '000000519522',
 '000000144784',
 '000000167486',
 '000000392228',
 '000000488673',
 '000000191013',
 '000000080057',
 '000000570169',
 '000000224807',
 '000000163562',
 '000000136355',
 '000000492362',
 '000000102707',
 '000000232563',
 '000000010977',
 '000000051598',
 '000000032285',
 '000000520910',
 '000000131273',
 '000000206411',
 '000000472375',
 '000000481404',
 '000000471991',
 '000000017436',
 '000000177934',
 '000000165518',
 '000000571718',
 '000000459467',
 '000000135673',
 '000000134886',
 '000000485895',
 '000000287545',
 '000000577182',
 '000000289222',
 '000000372819',
 '000000310072',
 '000000087144',
 '000000430875',
 '000000060347',
 '000000042070',
 '000000420916',
 '000000453584',
 '000000296224',
 '000000122606',
 '000000311909',
 '000000579893',
 '000000284296',
 '000000221017',
 '000000315001',
 '000000439715',
 '000000284991',
 '000000389566',
 '000000078843',
 '000000122927',
 '000000225532',
 '000000013659',
 '000000153568',
 '000000395633',
 '000000419096',
 '000000203488',
 '000000361268',
 '000000466125',
 '000000414795',
 '000000508101',
 '000000253386',
 '000000222991',
 '000000530854',
 '000000351810',
 '000000338624',
 '000000138492',
 '000000263463',
 '000000226592',
 '000000378454',
 '000000020059',
 '000000227686',
 '000000476215',
 '000000297698',
 '000000247917',
 '000000439522',
 '000000479448',
 '000000424721',
 '000000026690',
 '000000558854',
 '000000176901',
 '000000334767',
 '000000301563',
 '000000086755',
 '000000194471',
 '000000420281',
 '000000533206',
 '000000099810',
 '000000334483',
 '000000089670',
 '000000482275',
 '000000404805',
 '000000002261',
 '000000425702',
 '000000036844',
 '000000012576',
 '000000361238',
 '000000108253',
 '000000319935',
 '000000003934',
 '000000029596',
 '000000047740',
 '000000077460',
 '000000014439',
 '000000571893',
 '000000447314',
 '000000181303',
 '000000058350',
 '000000026465',
 '000000246968',
 '000000536947',
 '000000076731',
 '000000286182',
 '000000433980',
 '000000561366',
 '000000380913',
 '000000032887',
 '000000517687',
 '000000213035',
 '000000399205',
 '000000349837',
 '000000350002',
 '000000131431',
 '000000356248',
 '000000334399',
 '000000057150',
 '000000363666',
 '000000507235',
 '000000169996',
 '000000226417',
 '000000481573',
 '000000056127',
 '000000123480',
 '000000274687',
 '000000164637',
 '000000178028',
 '000000493286',
 '000000348216',
 '000000345027',
 '000000571804',
 '000000140658',
 '000000102644',
 '000000581615',
 '000000279887',
 '000000230008',
 '000000284698',
 '000000102356',
 '000000456394',
 '000000323709',
 '000000452122',
 '000000579158',
 '000000525322',
 '000000033114',
 '000000008690',
 '000000381639',
 '000000217614',
 '000000284445',
 '000000468124',
 '000000187144',
 '000000273198',
 '000000095843',
 '000000417779',
 '000000447342',
 '000000166563',
 '000000490125',
 '000000561009',
 '000000183675',
 '000000290248',
 '000000532058',
 '000000214200',
 '000000578093',
 '000000369751',
 '000000429011',
 '000000301061',
 '000000105264',
 '000000267434',
 '000000370711',
 '000000025393',
 '000000471087',
 '000000106757',
 '000000183648',
 '000000358525',
 '000000049269',
 '000000079144',
 '000000519688',
 '000000431727',
 '000000130699',
 '000000215245',
 '000000091921',
 '000000218424',
 '000000473974',
 '000000405249',
 '000000235784',
 '000000521540',
 '000000537506',
 '000000119445',
 '000000507015',
 '000000173830',
 '000000356498',
 '000000435081',
 '000000018575',
 '000000373315',
 '000000227765',
 '000000013546',
 '000000067310',
 '000000125936',
 '000000389109',
 '000000322211',
 '000000184384',
 '000000426329',
 '000000128476',
 '000000414034',
 '000000450488',
 '000000099182',
 '000000051738',
 '000000099039',
 '000000075456',
 '000000134882',
 '000000442323',
 '000000232489',
 '000000351823',
 '000000065736',
 '000000001000',
 '000000379842',
 '000000013923',
 '000000559543',
 '000000185890',
 '000000357978',
 '000000129492',
 '000000261097',
 '000000410510',
 '000000039951',
 '000000306700',
 '000000146457',
 '000000214224',
 '000000332845',
 '000000255483',
 '000000222455',
 '000000187271',
 '000000462629',
 '000000544565',
 '000000369771',
 '000000035963',
 '000000289516',
 '000000334309',
 '000000452084',
 '000000301718',
 '000000429598',
 '000000165257',
 '000000093437',
 '000000413552',
 '000000062025',
 '000000017379',
 '000000176778',
 '000000104572',
 '000000090108',
 '000000157124',
 '000000089556',
 '000000266206',
 '000000086220',
 '000000508602',
 ...]

Transform annotation#

For the task-chanining or merging multiple heterogeneous datasets, we need to redefine the class definition. Datumaro provides this class redefinition through remap_labels as below.

[6]:
mapping = {"motorcycle": "bicycle", "bus": "car", "truck": "car"}
remap_label_dataset = dataset.transform("remap_labels", mapping=mapping)
remap_label_dataset
[6]:
Dataset
        size=123287
        source_path=coco_dataset
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=122218
        annotations_count=1018861
subsets
        train2017: # of items=118287, # of annotated items=117266, # of annotations=976995, annotation types=['mask', 'polygon']
        val2017: # of items=5000, # of annotated items=4952, # of annotations=41866, annotation types=['mask', 'polygon']
categories
        label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Split datasets#

From now on, we are going to give examples of extracting the subset of the imported dataset and splitting this into multiple subsets. Datumaro provides two types of splitter; one is the per-sample level random splitter from the given ratio of subsets and the other is the task-specific splitter under consideration of annotation instances.

We first extract the validation dataset and split this into multiple cross-validation datasets.

[7]:
# from datumaro.components.dataset import Dataset

val_dataset = dataset.filter(
    '/item[subset="val2017"]'
)  # or Dataset(dataset.get_subset(subsets[0]))
val_dataset
[7]:
Dataset
        size=5000
        source_path=coco_dataset
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=4952
        annotations_count=41866
subsets
        val2017: # of items=5000, # of annotated items=4952, # of annotations=41866, annotation types=['mask', 'polygon']
categories
        label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
[8]:
splits = (("val1", 0.2), ("val2", 0.2), ("val3", 0.2), ("val4", 0.2), ("val5", 0.2))
crossval_dataset = val_dataset.transform("random_split", splits=splits)
crossval_dataset
[8]:
Dataset
        size=5000
        source_path=coco_dataset
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=4952
        annotations_count=41866
subsets
        val1: # of items=1000, # of annotated items=991, # of annotations=8344, annotation types=['mask', 'polygon']
        val2: # of items=1000, # of annotated items=991, # of annotations=7646, annotation types=['mask', 'polygon']
        val3: # of items=1000, # of annotated items=993, # of annotations=8625, annotation types=['mask', 'polygon']
        val4: # of items=1000, # of annotated items=986, # of annotations=8752, annotation types=['mask', 'polygon']
        val5: # of items=1000, # of annotated items=991, # of annotations=8499, annotation types=['mask', 'polygon']
categories
        label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Furthermore, Datumaro provides the split function in the viewpoint of annotation instead of sample through a task-specific splitter. By performing below, we can get the well-distributed validation datasets in terms of the number of annotations.

[9]:
import datumaro.plugins.splitter as splitter

task = splitter.SplitTask.segmentation.name
splits = [("val1", 0.2), ("val2", 0.2), ("val3", 0.2), ("val4", 0.2), ("val5", 0.2)]

crossval_per_ann_dataset = val_dataset.transform("split", task=task, splits=splits)
crossval_per_ann_dataset
[9]:
Dataset
        size=5000
        source_path=coco_dataset
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=4952
        annotations_count=41866
subsets
        val1: # of items=1000, # of annotated items=1000, # of annotations=8368, annotation types=['mask', 'polygon']
        val2: # of items=967, # of annotated items=919, # of annotations=8374, annotation types=['mask', 'polygon']
        val3: # of items=1032, # of annotated items=1032, # of annotations=8374, annotation types=['mask', 'polygon']
        val4: # of items=987, # of annotated items=987, # of annotations=8376, annotation types=['mask', 'polygon']
        val5: # of items=1014, # of annotated items=1014, # of annotations=8374, annotation types=['mask', 'polygon']
categories
        label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Lastly, we can rename the subset as below.

[10]:
mapping = {"val1": "train", "val2": "train", "val3": "train", "val4": "val", "val5": "test"}
test_dataset = dataset.transform("map_subsets", mapping=mapping)
test_dataset
[10]:
Dataset
        size=5000
        source_path=coco_dataset
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=4952
        annotations_count=41866
subsets
        test: # of items=1014, # of annotated items=1014, # of annotations=8374, annotation types=['mask', 'polygon']
        train: # of items=2999, # of annotated items=2951, # of annotations=25116, annotation types=['mask', 'polygon']
        val: # of items=987, # of annotated items=987, # of annotations=8376, annotation types=['mask', 'polygon']
categories
        label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']