Transform Dataset: Re-id, Reindexing, Remapping, etc.#
In this notebook example, we will take a look at Datumaro transform api, where transform provides splitting and merging subsets, redefining annotation information, reidentifying media, and task-changing with the modification of the annotation format, e.g., from masks to polygons, from bounding boxes to masks, from shapes to bounding boxes, etc.
Prerequisite#
Download COCO 2017 validation dataset#
Please refer openvinotoolkit/datumaro to prepare COCO 2017 validation dataset.
[2]:
# Copyright (C) 2022 Intel Corporation
#
# SPDX-License-Identifier: MIT
import datumaro as dm
dataset = dm.Dataset.import_from("coco_dataset", format="coco_instances")
print("Representation for sample COCO dataset")
dataset
WARNING:root:File 'coco_dataset/annotations/panoptic_train2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/panoptic_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/person_keypoints_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/captions_val2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/person_keypoints_train2017.json' was skipped, could't match this file with any of these tasks: coco_instances
WARNING:root:File 'coco_dataset/annotations/captions_train2017.json' was skipped, could't match this file with any of these tasks: coco_instances
Representation for sample COCO dataset
[2]:
Dataset
size=123287
source_path=coco_dataset
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=122218
annotations_count=1018861
subsets
train2017: # of items=118287, # of annotated items=117266, # of annotations=976995, annotation types=['mask', 'polygon']
val2017: # of items=5000, # of annotated items=4952, # of annotations=41866, annotation types=['mask', 'polygon']
categories
label: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
Transform media ID#
We first modify the media_id
through transformation. The original media_id
are given by below.
[3]:
subsets = list(dataset.subsets().keys())
print("Subset candidates:", subsets)
def get_ids(dataset: dm.Dataset, subset: str):
ids = []
for item in dataset:
if item.subset == subset:
ids += [item.id]
return ids
get_ids(dataset, subsets[0])
Subset candidates: ['val2017', 'train2017']
[3]:
['000000397133',
'000000037777',
'000000252219',
'000000087038',
'000000174482',
'000000403385',
'000000006818',
'000000480985',
'000000458054',
'000000331352',
'000000296649',
'000000386912',
'000000502136',
'000000491497',
'000000184791',
'000000348881',
'000000289393',
'000000522713',
'000000181666',
'000000017627',
'000000143931',
'000000303818',
'000000463730',
'000000460347',
'000000322864',
'000000226111',
'000000153299',
'000000308394',
'000000456496',
'000000058636',
'000000041888',
'000000184321',
'000000565778',
'000000297343',
'000000336587',
'000000122745',
'000000219578',
'000000555705',
'000000443303',
'000000500663',
'000000418281',
'000000025560',
'000000403817',
'000000085329',
'000000329323',
'000000239274',
'000000286994',
'000000511321',
'000000314294',
'000000233771',
'000000475779',
'000000301867',
'000000312421',
'000000185250',
'000000356427',
'000000572517',
'000000270244',
'000000516316',
'000000125211',
'000000562121',
'000000360661',
'000000016228',
'000000382088',
'000000266409',
'000000430961',
'000000080671',
'000000577539',
'000000104612',
'000000476258',
'000000448365',
'000000035197',
'000000349860',
'000000180135',
'000000486438',
'000000400573',
'000000109798',
'000000370677',
'000000238866',
'000000369370',
'000000502737',
'000000515579',
'000000515445',
'000000173383',
'000000438862',
'000000180560',
'000000347693',
'000000039956',
'000000321214',
'000000474028',
'000000066523',
'000000355257',
'000000142092',
'000000063154',
'000000199551',
'000000239347',
'000000514508',
'000000473237',
'000000228144',
'000000206027',
'000000078915',
'000000551215',
'000000544519',
'000000096493',
'000000023899',
'000000340175',
'000000578500',
'000000366141',
'000000057597',
'000000559842',
'000000434230',
'000000428454',
'000000399462',
'000000261061',
'000000168330',
'000000383384',
'000000342006',
'000000217285',
'000000236412',
'000000524456',
'000000153343',
'000000095786',
'000000326541',
'000000213086',
'000000231339',
'000000508730',
'000000550426',
'000000368294',
'000000171190',
'000000301135',
'000000580294',
'000000494869',
'000000033638',
'000000329219',
'000000034873',
'000000186980',
'000000127182',
'000000356387',
'000000367680',
'000000263796',
'000000117425',
'000000365387',
'000000487583',
'000000504711',
'000000363840',
'000000214720',
'000000379453',
'000000311295',
'000000029393',
'000000278848',
'000000166391',
'000000048153',
'000000459153',
'000000295713',
'000000223130',
'000000273132',
'000000198960',
'000000344059',
'000000410428',
'000000087875',
'000000450758',
'000000458790',
'000000460160',
'000000458109',
'000000030675',
'000000566524',
'000000338428',
'000000545826',
'000000166277',
'000000269314',
'000000476415',
'000000292082',
'000000360137',
'000000122046',
'000000352684',
'000000512836',
'000000008021',
'000000107226',
'000000084477',
'000000562243',
'000000181859',
'000000177015',
'000000292236',
'000000121506',
'000000288042',
'000000453860',
'000000500257',
'000000113403',
'000000125062',
'000000375015',
'000000334719',
'000000134112',
'000000283520',
'000000031269',
'000000319721',
'000000165351',
'000000347265',
'000000414170',
'000000231508',
'000000389381',
'000000118921',
'000000021503',
'000000000785',
'000000300842',
'000000105014',
'000000261982',
'000000034205',
'000000099242',
'000000314709',
'000000460494',
'000000339442',
'000000541055',
'000000409475',
'000000464786',
'000000378605',
'000000331817',
'000000218091',
'000000578545',
'000000363207',
'000000372577',
'000000212166',
'000000172571',
'000000294831',
'000000084431',
'000000323355',
'000000355325',
'000000100582',
'000000555412',
'000000004495',
'000000009483',
'000000326082',
'000000398237',
'000000507223',
'000000031050',
'000000239537',
'000000340930',
'000000011813',
'000000281414',
'000000537991',
'000000284282',
'000000321333',
'000000521282',
'000000108026',
'000000243204',
'000000177935',
'000000038829',
'000000397327',
'000000501523',
'000000555050',
'000000376442',
'000000187243',
'000000356347',
'000000293044',
'000000560279',
'000000042276',
'000000534827',
'000000190756',
'000000482917',
'000000300659',
'000000199977',
'000000442480',
'000000384350',
'000000383621',
'000000189828',
'000000412894',
'000000537153',
'000000361103',
'000000392722',
'000000338560',
'000000264535',
'000000295231',
'000000154947',
'000000212559',
'000000458755',
'000000104782',
'000000315257',
'000000130599',
'000000227187',
'000000151662',
'000000461275',
'000000523811',
'000000456559',
'000000101068',
'000000140640',
'000000516708',
'000000544605',
'000000385190',
'000000338986',
'000000053994',
'000000061171',
'000000314034',
'000000291490',
'000000152740',
'000000024919',
'000000079837',
'000000021903',
'000000564133',
'000000337055',
'000000110638',
'000000034139',
'000000080340',
'000000083113',
'000000173033',
'000000255664',
'000000072813',
'000000545129',
'000000546011',
'000000121031',
'000000172547',
'000000369081',
'000000509131',
'000000578922',
'000000464089',
'000000453708',
'000000177714',
'000000459887',
'000000155179',
'000000261116',
'000000396274',
'000000029640',
'000000141328',
'000000308430',
'000000043314',
'000000273715',
'000000456303',
'000000406611',
'000000475064',
'000000466567',
'000000137246',
'000000015079',
'000000296284',
'000000226147',
'000000226903',
'000000127517',
'000000162092',
'000000131379',
'000000366611',
'000000263969',
'000000551439',
'000000474167',
'000000159458',
'000000554735',
'000000099428',
'000000386352',
'000000173004',
'000000311394',
'000000578489',
'000000189310',
'000000491366',
'000000448076',
'000000293804',
'000000312237',
'000000221291',
'000000141821',
'000000410650',
'000000199310',
'000000323151',
'000000089648',
'000000219283',
'000000471869',
'000000520264',
'000000111179',
'000000151000',
'000000100624',
'000000332570',
'000000057238',
'000000502732',
'000000135561',
'000000008277',
'000000173044',
'000000168458',
'000000512194',
'000000370042',
'000000189436',
'000000533958',
'000000117645',
'000000221708',
'000000202228',
'000000403565',
'000000211042',
'000000492878',
'000000441586',
'000000547816',
'000000306733',
'000000530099',
'000000312278',
'000000097679',
'000000564127',
'000000251065',
'000000003845',
'000000138819',
'000000205834',
'000000348708',
'000000166521',
'000000485802',
'000000099054',
'000000022969',
'000000570539',
'000000278353',
'000000158548',
'000000461405',
'000000176606',
'000000044699',
'000000559956',
'000000268996',
'000000011197',
'000000483667',
'000000448810',
'000000000724',
'000000051961',
'000000375278',
'000000302165',
'000000131131',
'000000098839',
'000000402992',
'000000465675',
'000000240754',
'000000021167',
'000000148730',
'000000384468',
'000000253742',
'000000186873',
'000000082180',
'000000446522',
'000000552902',
'000000125405',
'000000110211',
'000000016010',
'000000064462',
'000000314182',
'000000248980',
'000000068387',
'000000429281',
'000000345466',
'000000352900',
'000000118367',
'000000113235',
'000000311303',
'000000163640',
'000000370999',
'000000001490',
'000000329456',
'000000570471',
'000000088269',
'000000260470',
'000000193494',
'000000252776',
'000000201072',
'000000018150',
'000000337498',
'000000521405',
'000000518770',
'000000201646',
'000000036936',
'000000059044',
'000000172946',
'000000234607',
'000000532690',
'000000323895',
'000000384670',
'000000050326',
'000000205542',
'000000217957',
'000000162035',
'000000415727',
'000000046252',
'000000182021',
'000000231747',
'000000090284',
'000000286553',
'000000488736',
'000000063602',
'000000383386',
'000000450686',
'000000005060',
'000000286523',
'000000120420',
'000000579655',
'000000117908',
'000000550322',
'000000322844',
'000000218362',
'000000213224',
'000000223747',
'000000297578',
'000000458992',
'000000078266',
'000000164602',
'000000440475',
'000000101762',
'000000557501',
'000000203317',
'000000368940',
'000000569917',
'000000144798',
'000000284623',
'000000520301',
'000000127987',
'000000063740',
'000000036494',
'000000210032',
'000000488270',
'000000067180',
'000000281179',
'000000064359',
'000000126226',
'000000190923',
'000000150265',
'000000216739',
'000000038048',
'000000354829',
'000000525155',
'000000163314',
'000000259571',
'000000561679',
'000000236166',
'000000153529',
'000000473015',
'000000379800',
'000000253835',
'000000034071',
'000000036861',
'000000569565',
'000000219271',
'000000205647',
'000000460841',
'000000123131',
'000000334006',
'000000511599',
'000000229858',
'000000174004',
'000000519764',
'000000137576',
'000000087470',
'000000009769',
'000000558114',
'000000205776',
'000000163257',
'000000475678',
'000000085478',
'000000318080',
'000000361551',
'000000236784',
'000000092839',
'000000042296',
'000000560266',
'000000486479',
'000000127955',
'000000307658',
'000000417465',
'000000342971',
'000000011760',
'000000069106',
'000000070158',
'000000176634',
'000000281447',
'000000552371',
'000000361919',
'000000560256',
'000000138115',
'000000114871',
'000000374369',
'000000123213',
'000000123321',
'000000015278',
'000000357742',
'000000439854',
'000000465836',
'000000414385',
'000000131556',
'000000322724',
'000000320664',
'000000481390',
'000000109916',
'000000276434',
'000000579635',
'000000295316',
'000000571313',
'000000183127',
'000000115898',
'000000146358',
'000000329542',
'000000189752',
'000000290163',
'000000091406',
'000000322352',
'000000223959',
'000000326248',
'000000218439',
'000000453722',
'000000293625',
'000000411817',
'000000546964',
'000000215259',
'000000573094',
'000000560011',
'000000038576',
'000000147729',
'000000579307',
'000000154425',
'000000432898',
'000000404923',
'000000130586',
'000000163057',
'000000007511',
'000000067406',
'000000290179',
'000000248752',
'000000054593',
'000000116208',
'000000340697',
'000000450303',
'000000494427',
'000000137294',
'000000410880',
'000000311180',
'000000091654',
'000000181796',
'000000002431',
'000000349184',
'000000298396',
'000000472046',
'000000074058',
'000000058029',
'000000134096',
'000000111951',
'000000103585',
'000000210273',
'000000352584',
'000000446651',
'000000194875',
'000000052017',
'000000336309',
'000000227478',
'000000339870',
'000000080666',
'000000033707',
'000000327601',
'000000255749',
'000000008762',
'000000526392',
'000000535578',
'000000580757',
'000000165039',
'000000148719',
'000000108440',
'000000489842',
'000000579818',
'000000423229',
'000000323828',
'000000166287',
'000000101420',
'000000334555',
'000000196759',
'000000411665',
'000000061418',
'000000526751',
'000000024021',
'000000277020',
'000000047828',
'000000183716',
'000000271997',
'000000008532',
'000000094336',
'000000390555',
'000000250282',
'000000068409',
'000000002299',
'000000011051',
'000000066038',
'000000360960',
'000000360097',
'000000421455',
'000000504589',
'000000464522',
'000000454750',
'000000509735',
'000000023034',
'000000141671',
'000000506656',
'000000272566',
'000000045728',
'000000424551',
'000000341719',
'000000072795',
'000000078959',
'000000417285',
'000000002157',
'000000043816',
'000000455555',
'000000535306',
'000000030504',
'000000093353',
'000000530052',
'000000473118',
'000000091779',
'000000283113',
'000000226130',
'000000097278',
'000000567640',
'000000532493',
'000000045550',
'000000156643',
'000000430056',
'000000410456',
'000000441286',
'000000279541',
'000000000885',
'000000378284',
'000000156076',
'000000143572',
'000000229849',
'000000039551',
'000000056344',
'000000193348',
'000000016958',
'000000572678',
'000000106235',
'000000341681',
'000000083172',
'000000343524',
'000000395801',
'000000388056',
'000000259690',
'000000235836',
'000000343218',
'000000205105',
'000000513283',
'000000176446',
'000000371677',
'000000308531',
'000000497599',
'000000455352',
'000000236914',
'000000232684',
'000000415238',
'000000290843',
'000000519522',
'000000144784',
'000000167486',
'000000392228',
'000000488673',
'000000191013',
'000000080057',
'000000570169',
'000000224807',
'000000163562',
'000000136355',
'000000492362',
'000000102707',
'000000232563',
'000000010977',
'000000051598',
'000000032285',
'000000520910',
'000000131273',
'000000206411',
'000000472375',
'000000481404',
'000000471991',
'000000017436',
'000000177934',
'000000165518',
'000000571718',
'000000459467',
'000000135673',
'000000134886',
'000000485895',
'000000287545',
'000000577182',
'000000289222',
'000000372819',
'000000310072',
'000000087144',
'000000430875',
'000000060347',
'000000042070',
'000000420916',
'000000453584',
'000000296224',
'000000122606',
'000000311909',
'000000579893',
'000000284296',
'000000221017',
'000000315001',
'000000439715',
'000000284991',
'000000389566',
'000000078843',
'000000122927',
'000000225532',
'000000013659',
'000000153568',
'000000395633',
'000000419096',
'000000203488',
'000000361268',
'000000466125',
'000000414795',
'000000508101',
'000000253386',
'000000222991',
'000000530854',
'000000351810',
'000000338624',
'000000138492',
'000000263463',
'000000226592',
'000000378454',
'000000020059',
'000000227686',
'000000476215',
'000000297698',
'000000247917',
'000000439522',
'000000479448',
'000000424721',
'000000026690',
'000000558854',
'000000176901',
'000000334767',
'000000301563',
'000000086755',
'000000194471',
'000000420281',
'000000533206',
'000000099810',
'000000334483',
'000000089670',
'000000482275',
'000000404805',
'000000002261',
'000000425702',
'000000036844',
'000000012576',
'000000361238',
'000000108253',
'000000319935',
'000000003934',
'000000029596',
'000000047740',
'000000077460',
'000000014439',
'000000571893',
'000000447314',
'000000181303',
'000000058350',
'000000026465',
'000000246968',
'000000536947',
'000000076731',
'000000286182',
'000000433980',
'000000561366',
'000000380913',
'000000032887',
'000000517687',
'000000213035',
'000000399205',
'000000349837',
'000000350002',
'000000131431',
'000000356248',
'000000334399',
'000000057150',
'000000363666',
'000000507235',
'000000169996',
'000000226417',
'000000481573',
'000000056127',
'000000123480',
'000000274687',
'000000164637',
'000000178028',
'000000493286',
'000000348216',
'000000345027',
'000000571804',
'000000140658',
'000000102644',
'000000581615',
'000000279887',
'000000230008',
'000000284698',
'000000102356',
'000000456394',
'000000323709',
'000000452122',
'000000579158',
'000000525322',
'000000033114',
'000000008690',
'000000381639',
'000000217614',
'000000284445',
'000000468124',
'000000187144',
'000000273198',
'000000095843',
'000000417779',
'000000447342',
'000000166563',
'000000490125',
'000000561009',
'000000183675',
'000000290248',
'000000532058',
'000000214200',
'000000578093',
'000000369751',
'000000429011',
'000000301061',
'000000105264',
'000000267434',
'000000370711',
'000000025393',
'000000471087',
'000000106757',
'000000183648',
'000000358525',
'000000049269',
'000000079144',
'000000519688',
'000000431727',
'000000130699',
'000000215245',
'000000091921',
'000000218424',
'000000473974',
'000000405249',
'000000235784',
'000000521540',
'000000537506',
'000000119445',
'000000507015',
'000000173830',
'000000356498',
'000000435081',
'000000018575',
'000000373315',
'000000227765',
'000000013546',
'000000067310',
'000000125936',
'000000389109',
'000000322211',
'000000184384',
'000000426329',
'000000128476',
'000000414034',
'000000450488',
'000000099182',
'000000051738',
'000000099039',
'000000075456',
'000000134882',
'000000442323',
'000000232489',
'000000351823',
'000000065736',
'000000001000',
'000000379842',
'000000013923',
'000000559543',
'000000185890',
'000000357978',
'000000129492',
'000000261097',
'000000410510',
'000000039951',
'000000306700',
'000000146457',
'000000214224',
'000000332845',
'000000255483',
'000000222455',
'000000187271',
'000000462629',
'000000544565',
'000000369771',
'000000035963',
'000000289516',
'000000334309',
'000000452084',
'000000301718',
'000000429598',
'000000165257',
'000000093437',
'000000413552',
'000000062025',
'000000017379',
'000000176778',
'000000104572',
'000000090108',
'000000157124',
'000000089556',
'000000266206',
'000000086220',
'000000508602',
...]
We here adopt reindex
transformation to make media_id
be incrementing from start
.
[4]:
reindexing_dataset = dataset.transform("reindex", start=0)
get_ids(reindexing_dataset, subsets[0])
[4]:
['0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'19',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'30',
'31',
'32',
'33',
'34',
'35',
'36',
'37',
'38',
'39',
'40',
'41',
'42',
'43',
'44',
'45',
'46',
'47',
'48',
'49',
'50',
'51',
'52',
'53',
'54',
'55',
'56',
'57',
'58',
'59',
'60',
'61',
'62',
'63',
'64',
'65',
'66',
'67',
'68',
'69',
'70',
'71',
'72',
'73',
'74',
'75',
'76',
'77',
'78',
'79',
'80',
'81',
'82',
'83',
'84',
'85',
'86',
'87',
'88',
'89',
'90',
'91',
'92',
'93',
'94',
'95',
'96',
'97',
'98',
'99',
'100',
'101',
'102',
'103',
'104',
'105',
'106',
'107',
'108',
'109',
'110',
'111',
'112',
'113',
'114',
'115',
'116',
'117',
'118',
'119',
'120',
'121',
'122',
'123',
'124',
'125',
'126',
'127',
'128',
'129',
'130',
'131',
'132',
'133',
'134',
'135',
'136',
'137',
'138',
'139',
'140',
'141',
'142',
'143',
'144',
'145',
'146',
'147',
'148',
'149',
'150',
'151',
'152',
'153',
'154',
'155',
'156',
'157',
'158',
'159',
'160',
'161',
'162',
'163',
'164',
'165',
'166',
'167',
'168',
'169',
'170',
'171',
'172',
'173',
'174',
'175',
'176',
'177',
'178',
'179',
'180',
'181',
'182',
'183',
'184',
'185',
'186',
'187',
'188',
'189',
'190',
'191',
'192',
'193',
'194',
'195',
'196',
'197',
'198',
'199',
'200',
'201',
'202',
'203',
'204',
'205',
'206',
'207',
'208',
'209',
'210',
'211',
'212',
'213',
'214',
'215',
'216',
'217',
'218',
'219',
'220',
'221',
'222',
'223',
'224',
'225',
'226',
'227',
'228',
'229',
'230',
'231',
'232',
'233',
'234',
'235',
'236',
'237',
'238',
'239',
'240',
'241',
'242',
'243',
'244',
'245',
'246',
'247',
'248',
'249',
'250',
'251',
'252',
'253',
'254',
'255',
'256',
'257',
'258',
'259',
'260',
'261',
'262',
'263',
'264',
'265',
'266',
'267',
'268',
'269',
'270',
'271',
'272',
'273',
'274',
'275',
'276',
'277',
'278',
'279',
'280',
'281',
'282',
'283',
'284',
'285',
'286',
'287',
'288',
'289',
'290',
'291',
'292',
'293',
'294',
'295',
'296',
'297',
'298',
'299',
'300',
'301',
'302',
'303',
'304',
'305',
'306',
'307',
'308',
'309',
'310',
'311',
'312',
'313',
'314',
'315',
'316',
'317',
'318',
'319',
'320',
'321',
'322',
'323',
'324',
'325',
'326',
'327',
'328',
'329',
'330',
'331',
'332',
'333',
'334',
'335',
'336',
'337',
'338',
'339',
'340',
'341',
'342',
'343',
'344',
'345',
'346',
'347',
'348',
'349',
'350',
'351',
'352',
'353',
'354',
'355',
'356',
'357',
'358',
'359',
'360',
'361',
'362',
'363',
'364',
'365',
'366',
'367',
'368',
'369',
'370',
'371',
'372',
'373',
'374',
'375',
'376',
'377',
'378',
'379',
'380',
'381',
'382',
'383',
'384',
'385',
'386',
'387',
'388',
'389',
'390',
'391',
'392',
'393',
'394',
'395',
'396',
'397',
'398',
'399',
'400',
'401',
'402',
'403',
'404',
'405',
'406',
'407',
'408',
'409',
'410',
'411',
'412',
'413',
'414',
'415',
'416',
'417',
'418',
'419',
'420',
'421',
'422',
'423',
'424',
'425',
'426',
'427',
'428',
'429',
'430',
'431',
'432',
'433',
'434',
'435',
'436',
'437',
'438',
'439',
'440',
'441',
'442',
'443',
'444',
'445',
'446',
'447',
'448',
'449',
'450',
'451',
'452',
'453',
'454',
'455',
'456',
'457',
'458',
'459',
'460',
'461',
'462',
'463',
'464',
'465',
'466',
'467',
'468',
'469',
'470',
'471',
'472',
'473',
'474',
'475',
'476',
'477',
'478',
'479',
'480',
'481',
'482',
'483',
'484',
'485',
'486',
'487',
'488',
'489',
'490',
'491',
'492',
'493',
'494',
'495',
'496',
'497',
'498',
'499',
'500',
'501',
'502',
'503',
'504',
'505',
'506',
'507',
'508',
'509',
'510',
'511',
'512',
'513',
'514',
'515',
'516',
'517',
'518',
'519',
'520',
'521',
'522',
'523',
'524',
'525',
'526',
'527',
'528',
'529',
'530',
'531',
'532',
'533',
'534',
'535',
'536',
'537',
'538',
'539',
'540',
'541',
'542',
'543',
'544',
'545',
'546',
'547',
'548',
'549',
'550',
'551',
'552',
'553',
'554',
'555',
'556',
'557',
'558',
'559',
'560',
'561',
'562',
'563',
'564',
'565',
'566',
'567',
'568',
'569',
'570',
'571',
'572',
'573',
'574',
'575',
'576',
'577',
'578',
'579',
'580',
'581',
'582',
'583',
'584',
'585',
'586',
'587',
'588',
'589',
'590',
'591',
'592',
'593',
'594',
'595',
'596',
'597',
'598',
'599',
'600',
'601',
'602',
'603',
'604',
'605',
'606',
'607',
'608',
'609',
'610',
'611',
'612',
'613',
'614',
'615',
'616',
'617',
'618',
'619',
'620',
'621',
'622',
'623',
'624',
'625',
'626',
'627',
'628',
'629',
'630',
'631',
'632',
'633',
'634',
'635',
'636',
'637',
'638',
'639',
'640',
'641',
'642',
'643',
'644',
'645',
'646',
'647',
'648',
'649',
'650',
'651',
'652',
'653',
'654',
'655',
'656',
'657',
'658',
'659',
'660',
'661',
'662',
'663',
'664',
'665',
'666',
'667',
'668',
'669',
'670',
'671',
'672',
'673',
'674',
'675',
'676',
'677',
'678',
'679',
'680',
'681',
'682',
'683',
'684',
'685',
'686',
'687',
'688',
'689',
'690',
'691',
'692',
'693',
'694',
'695',
'696',
'697',
'698',
'699',
'700',
'701',
'702',
'703',
'704',
'705',
'706',
'707',
'708',
'709',
'710',
'711',
'712',
'713',
'714',
'715',
'716',
'717',
'718',
'719',
'720',
'721',
'722',
'723',
'724',
'725',
'726',
'727',
'728',
'729',
'730',
'731',
'732',
'733',
'734',
'735',
'736',
'737',
'738',
'739',
'740',
'741',
'742',
'743',
'744',
'745',
'746',
'747',
'748',
'749',
'750',
'751',
'752',
'753',
'754',
'755',
'756',
'757',
'758',
'759',
'760',
'761',
'762',
'763',
'764',
'765',
'766',
'767',
'768',
'769',
'770',
'771',
'772',
'773',
'774',
'775',
'776',
'777',
'778',
'779',
'780',
'781',
'782',
'783',
'784',
'785',
'786',
'787',
'788',
'789',
'790',
'791',
'792',
'793',
'794',
'795',
'796',
'797',
'798',
'799',
'800',
'801',
'802',
'803',
'804',
'805',
'806',
'807',
'808',
'809',
'810',
'811',
'812',
'813',
'814',
'815',
'816',
'817',
'818',
'819',
'820',
'821',
'822',
'823',
'824',
'825',
'826',
'827',
'828',
'829',
'830',
'831',
'832',
'833',
'834',
'835',
'836',
'837',
'838',
'839',
'840',
'841',
'842',
'843',
'844',
'845',
'846',
'847',
'848',
'849',
'850',
'851',
'852',
'853',
'854',
'855',
'856',
'857',
'858',
'859',
'860',
'861',
'862',
'863',
'864',
'865',
'866',
'867',
'868',
'869',
'870',
'871',
'872',
'873',
'874',
'875',
'876',
'877',
'878',
'879',
'880',
'881',
'882',
'883',
'884',
'885',
'886',
'887',
'888',
'889',
'890',
'891',
'892',
'893',
'894',
'895',
'896',
'897',
'898',
'899',
'900',
'901',
'902',
'903',
'904',
'905',
'906',
'907',
'908',
'909',
'910',
'911',
'912',
'913',
'914',
'915',
'916',
'917',
'918',
'919',
'920',
'921',
'922',
'923',
'924',
'925',
'926',
'927',
'928',
'929',
'930',
'931',
'932',
'933',
'934',
'935',
'936',
'937',
'938',
'939',
'940',
'941',
'942',
'943',
'944',
'945',
'946',
'947',
'948',
'949',
'950',
'951',
'952',
'953',
'954',
'955',
'956',
'957',
'958',
'959',
'960',
'961',
'962',
'963',
'964',
'965',
'966',
'967',
'968',
'969',
'970',
'971',
'972',
'973',
'974',
'975',
'976',
'977',
'978',
'979',
'980',
'981',
'982',
'983',
'984',
'985',
'986',
'987',
'988',
'989',
'990',
'991',
'992',
'993',
'994',
'995',
'996',
'997',
'998',
'999',
...]
By adopting id_from_image_name
, we can rollback the media_id
to be the media name.
[5]:
rollback_dataset = dataset.transform("id_from_image_name")
get_ids(rollback_dataset, subsets[0])
[5]:
['000000397133',
'000000037777',
'000000252219',
'000000087038',
'000000174482',
'000000403385',
'000000006818',
'000000480985',
'000000458054',
'000000331352',
'000000296649',
'000000386912',
'000000502136',
'000000491497',
'000000184791',
'000000348881',
'000000289393',
'000000522713',
'000000181666',
'000000017627',
'000000143931',
'000000303818',
'000000463730',
'000000460347',
'000000322864',
'000000226111',
'000000153299',
'000000308394',
'000000456496',
'000000058636',
'000000041888',
'000000184321',
'000000565778',
'000000297343',
'000000336587',
'000000122745',
'000000219578',
'000000555705',
'000000443303',
'000000500663',
'000000418281',
'000000025560',
'000000403817',
'000000085329',
'000000329323',
'000000239274',
'000000286994',
'000000511321',
'000000314294',
'000000233771',
'000000475779',
'000000301867',
'000000312421',
'000000185250',
'000000356427',
'000000572517',
'000000270244',
'000000516316',
'000000125211',
'000000562121',
'000000360661',
'000000016228',
'000000382088',
'000000266409',
'000000430961',
'000000080671',
'000000577539',
'000000104612',
'000000476258',
'000000448365',
'000000035197',
'000000349860',
'000000180135',
'000000486438',
'000000400573',
'000000109798',
'000000370677',
'000000238866',
'000000369370',
'000000502737',
'000000515579',
'000000515445',
'000000173383',
'000000438862',
'000000180560',
'000000347693',
'000000039956',
'000000321214',
'000000474028',
'000000066523',
'000000355257',
'000000142092',
'000000063154',
'000000199551',
'000000239347',
'000000514508',
'000000473237',
'000000228144',
'000000206027',
'000000078915',
'000000551215',
'000000544519',
'000000096493',
'000000023899',
'000000340175',
'000000578500',
'000000366141',
'000000057597',
'000000559842',
'000000434230',
'000000428454',
'000000399462',
'000000261061',
'000000168330',
'000000383384',
'000000342006',
'000000217285',
'000000236412',
'000000524456',
'000000153343',
'000000095786',
'000000326541',
'000000213086',
'000000231339',
'000000508730',
'000000550426',
'000000368294',
'000000171190',
'000000301135',
'000000580294',
'000000494869',
'000000033638',
'000000329219',
'000000034873',
'000000186980',
'000000127182',
'000000356387',
'000000367680',
'000000263796',
'000000117425',
'000000365387',
'000000487583',
'000000504711',
'000000363840',
'000000214720',
'000000379453',
'000000311295',
'000000029393',
'000000278848',
'000000166391',
'000000048153',
'000000459153',
'000000295713',
'000000223130',
'000000273132',
'000000198960',
'000000344059',
'000000410428',
'000000087875',
'000000450758',
'000000458790',
'000000460160',
'000000458109',
'000000030675',
'000000566524',
'000000338428',
'000000545826',
'000000166277',
'000000269314',
'000000476415',
'000000292082',
'000000360137',
'000000122046',
'000000352684',
'000000512836',
'000000008021',
'000000107226',
'000000084477',
'000000562243',
'000000181859',
'000000177015',
'000000292236',
'000000121506',
'000000288042',
'000000453860',
'000000500257',
'000000113403',
'000000125062',
'000000375015',
'000000334719',
'000000134112',
'000000283520',
'000000031269',
'000000319721',
'000000165351',
'000000347265',
'000000414170',
'000000231508',
'000000389381',
'000000118921',
'000000021503',
'000000000785',
'000000300842',
'000000105014',
'000000261982',
'000000034205',
'000000099242',
'000000314709',
'000000460494',
'000000339442',
'000000541055',
'000000409475',
'000000464786',
'000000378605',
'000000331817',
'000000218091',
'000000578545',
'000000363207',
'000000372577',
'000000212166',
'000000172571',
'000000294831',
'000000084431',
'000000323355',
'000000355325',
'000000100582',
'000000555412',
'000000004495',
'000000009483',
'000000326082',
'000000398237',
'000000507223',
'000000031050',
'000000239537',
'000000340930',
'000000011813',
'000000281414',
'000000537991',
'000000284282',
'000000321333',
'000000521282',
'000000108026',
'000000243204',
'000000177935',
'000000038829',
'000000397327',
'000000501523',
'000000555050',
'000000376442',
'000000187243',
'000000356347',
'000000293044',
'000000560279',
'000000042276',
'000000534827',
'000000190756',
'000000482917',
'000000300659',
'000000199977',
'000000442480',
'000000384350',
'000000383621',
'000000189828',
'000000412894',
'000000537153',
'000000361103',
'000000392722',
'000000338560',
'000000264535',
'000000295231',
'000000154947',
'000000212559',
'000000458755',
'000000104782',
'000000315257',
'000000130599',
'000000227187',
'000000151662',
'000000461275',
'000000523811',
'000000456559',
'000000101068',
'000000140640',
'000000516708',
'000000544605',
'000000385190',
'000000338986',
'000000053994',
'000000061171',
'000000314034',
'000000291490',
'000000152740',
'000000024919',
'000000079837',
'000000021903',
'000000564133',
'000000337055',
'000000110638',
'000000034139',
'000000080340',
'000000083113',
'000000173033',
'000000255664',
'000000072813',
'000000545129',
'000000546011',
'000000121031',
'000000172547',
'000000369081',
'000000509131',
'000000578922',
'000000464089',
'000000453708',
'000000177714',
'000000459887',
'000000155179',
'000000261116',
'000000396274',
'000000029640',
'000000141328',
'000000308430',
'000000043314',
'000000273715',
'000000456303',
'000000406611',
'000000475064',
'000000466567',
'000000137246',
'000000015079',
'000000296284',
'000000226147',
'000000226903',
'000000127517',
'000000162092',
'000000131379',
'000000366611',
'000000263969',
'000000551439',
'000000474167',
'000000159458',
'000000554735',
'000000099428',
'000000386352',
'000000173004',
'000000311394',
'000000578489',
'000000189310',
'000000491366',
'000000448076',
'000000293804',
'000000312237',
'000000221291',
'000000141821',
'000000410650',
'000000199310',
'000000323151',
'000000089648',
'000000219283',
'000000471869',
'000000520264',
'000000111179',
'000000151000',
'000000100624',
'000000332570',
'000000057238',
'000000502732',
'000000135561',
'000000008277',
'000000173044',
'000000168458',
'000000512194',
'000000370042',
'000000189436',
'000000533958',
'000000117645',
'000000221708',
'000000202228',
'000000403565',
'000000211042',
'000000492878',
'000000441586',
'000000547816',
'000000306733',
'000000530099',
'000000312278',
'000000097679',
'000000564127',
'000000251065',
'000000003845',
'000000138819',
'000000205834',
'000000348708',
'000000166521',
'000000485802',
'000000099054',
'000000022969',
'000000570539',
'000000278353',
'000000158548',
'000000461405',
'000000176606',
'000000044699',
'000000559956',
'000000268996',
'000000011197',
'000000483667',
'000000448810',
'000000000724',
'000000051961',
'000000375278',
'000000302165',
'000000131131',
'000000098839',
'000000402992',
'000000465675',
'000000240754',
'000000021167',
'000000148730',
'000000384468',
'000000253742',
'000000186873',
'000000082180',
'000000446522',
'000000552902',
'000000125405',
'000000110211',
'000000016010',
'000000064462',
'000000314182',
'000000248980',
'000000068387',
'000000429281',
'000000345466',
'000000352900',
'000000118367',
'000000113235',
'000000311303',
'000000163640',
'000000370999',
'000000001490',
'000000329456',
'000000570471',
'000000088269',
'000000260470',
'000000193494',
'000000252776',
'000000201072',
'000000018150',
'000000337498',
'000000521405',
'000000518770',
'000000201646',
'000000036936',
'000000059044',
'000000172946',
'000000234607',
'000000532690',
'000000323895',
'000000384670',
'000000050326',
'000000205542',
'000000217957',
'000000162035',
'000000415727',
'000000046252',
'000000182021',
'000000231747',
'000000090284',
'000000286553',
'000000488736',
'000000063602',
'000000383386',
'000000450686',
'000000005060',
'000000286523',
'000000120420',
'000000579655',
'000000117908',
'000000550322',
'000000322844',
'000000218362',
'000000213224',
'000000223747',
'000000297578',
'000000458992',
'000000078266',
'000000164602',
'000000440475',
'000000101762',
'000000557501',
'000000203317',
'000000368940',
'000000569917',
'000000144798',
'000000284623',
'000000520301',
'000000127987',
'000000063740',
'000000036494',
'000000210032',
'000000488270',
'000000067180',
'000000281179',
'000000064359',
'000000126226',
'000000190923',
'000000150265',
'000000216739',
'000000038048',
'000000354829',
'000000525155',
'000000163314',
'000000259571',
'000000561679',
'000000236166',
'000000153529',
'000000473015',
'000000379800',
'000000253835',
'000000034071',
'000000036861',
'000000569565',
'000000219271',
'000000205647',
'000000460841',
'000000123131',
'000000334006',
'000000511599',
'000000229858',
'000000174004',
'000000519764',
'000000137576',
'000000087470',
'000000009769',
'000000558114',
'000000205776',
'000000163257',
'000000475678',
'000000085478',
'000000318080',
'000000361551',
'000000236784',
'000000092839',
'000000042296',
'000000560266',
'000000486479',
'000000127955',
'000000307658',
'000000417465',
'000000342971',
'000000011760',
'000000069106',
'000000070158',
'000000176634',
'000000281447',
'000000552371',
'000000361919',
'000000560256',
'000000138115',
'000000114871',
'000000374369',
'000000123213',
'000000123321',
'000000015278',
'000000357742',
'000000439854',
'000000465836',
'000000414385',
'000000131556',
'000000322724',
'000000320664',
'000000481390',
'000000109916',
'000000276434',
'000000579635',
'000000295316',
'000000571313',
'000000183127',
'000000115898',
'000000146358',
'000000329542',
'000000189752',
'000000290163',
'000000091406',
'000000322352',
'000000223959',
'000000326248',
'000000218439',
'000000453722',
'000000293625',
'000000411817',
'000000546964',
'000000215259',
'000000573094',
'000000560011',
'000000038576',
'000000147729',
'000000579307',
'000000154425',
'000000432898',
'000000404923',
'000000130586',
'000000163057',
'000000007511',
'000000067406',
'000000290179',
'000000248752',
'000000054593',
'000000116208',
'000000340697',
'000000450303',
'000000494427',
'000000137294',
'000000410880',
'000000311180',
'000000091654',
'000000181796',
'000000002431',
'000000349184',
'000000298396',
'000000472046',
'000000074058',
'000000058029',
'000000134096',
'000000111951',
'000000103585',
'000000210273',
'000000352584',
'000000446651',
'000000194875',
'000000052017',
'000000336309',
'000000227478',
'000000339870',
'000000080666',
'000000033707',
'000000327601',
'000000255749',
'000000008762',
'000000526392',
'000000535578',
'000000580757',
'000000165039',
'000000148719',
'000000108440',
'000000489842',
'000000579818',
'000000423229',
'000000323828',
'000000166287',
'000000101420',
'000000334555',
'000000196759',
'000000411665',
'000000061418',
'000000526751',
'000000024021',
'000000277020',
'000000047828',
'000000183716',
'000000271997',
'000000008532',
'000000094336',
'000000390555',
'000000250282',
'000000068409',
'000000002299',
'000000011051',
'000000066038',
'000000360960',
'000000360097',
'000000421455',
'000000504589',
'000000464522',
'000000454750',
'000000509735',
'000000023034',
'000000141671',
'000000506656',
'000000272566',
'000000045728',
'000000424551',
'000000341719',
'000000072795',
'000000078959',
'000000417285',
'000000002157',
'000000043816',
'000000455555',
'000000535306',
'000000030504',
'000000093353',
'000000530052',
'000000473118',
'000000091779',
'000000283113',
'000000226130',
'000000097278',
'000000567640',
'000000532493',
'000000045550',
'000000156643',
'000000430056',
'000000410456',
'000000441286',
'000000279541',
'000000000885',
'000000378284',
'000000156076',
'000000143572',
'000000229849',
'000000039551',
'000000056344',
'000000193348',
'000000016958',
'000000572678',
'000000106235',
'000000341681',
'000000083172',
'000000343524',
'000000395801',
'000000388056',
'000000259690',
'000000235836',
'000000343218',
'000000205105',
'000000513283',
'000000176446',
'000000371677',
'000000308531',
'000000497599',
'000000455352',
'000000236914',
'000000232684',
'000000415238',
'000000290843',
'000000519522',
'000000144784',
'000000167486',
'000000392228',
'000000488673',
'000000191013',
'000000080057',
'000000570169',
'000000224807',
'000000163562',
'000000136355',
'000000492362',
'000000102707',
'000000232563',
'000000010977',
'000000051598',
'000000032285',
'000000520910',
'000000131273',
'000000206411',
'000000472375',
'000000481404',
'000000471991',
'000000017436',
'000000177934',
'000000165518',
'000000571718',
'000000459467',
'000000135673',
'000000134886',
'000000485895',
'000000287545',
'000000577182',
'000000289222',
'000000372819',
'000000310072',
'000000087144',
'000000430875',
'000000060347',
'000000042070',
'000000420916',
'000000453584',
'000000296224',
'000000122606',
'000000311909',
'000000579893',
'000000284296',
'000000221017',
'000000315001',
'000000439715',
'000000284991',
'000000389566',
'000000078843',
'000000122927',
'000000225532',
'000000013659',
'000000153568',
'000000395633',
'000000419096',
'000000203488',
'000000361268',
'000000466125',
'000000414795',
'000000508101',
'000000253386',
'000000222991',
'000000530854',
'000000351810',
'000000338624',
'000000138492',
'000000263463',
'000000226592',
'000000378454',
'000000020059',
'000000227686',
'000000476215',
'000000297698',
'000000247917',
'000000439522',
'000000479448',
'000000424721',
'000000026690',
'000000558854',
'000000176901',
'000000334767',
'000000301563',
'000000086755',
'000000194471',
'000000420281',
'000000533206',
'000000099810',
'000000334483',
'000000089670',
'000000482275',
'000000404805',
'000000002261',
'000000425702',
'000000036844',
'000000012576',
'000000361238',
'000000108253',
'000000319935',
'000000003934',
'000000029596',
'000000047740',
'000000077460',
'000000014439',
'000000571893',
'000000447314',
'000000181303',
'000000058350',
'000000026465',
'000000246968',
'000000536947',
'000000076731',
'000000286182',
'000000433980',
'000000561366',
'000000380913',
'000000032887',
'000000517687',
'000000213035',
'000000399205',
'000000349837',
'000000350002',
'000000131431',
'000000356248',
'000000334399',
'000000057150',
'000000363666',
'000000507235',
'000000169996',
'000000226417',
'000000481573',
'000000056127',
'000000123480',
'000000274687',
'000000164637',
'000000178028',
'000000493286',
'000000348216',
'000000345027',
'000000571804',
'000000140658',
'000000102644',
'000000581615',
'000000279887',
'000000230008',
'000000284698',
'000000102356',
'000000456394',
'000000323709',
'000000452122',
'000000579158',
'000000525322',
'000000033114',
'000000008690',
'000000381639',
'000000217614',
'000000284445',
'000000468124',
'000000187144',
'000000273198',
'000000095843',
'000000417779',
'000000447342',
'000000166563',
'000000490125',
'000000561009',
'000000183675',
'000000290248',
'000000532058',
'000000214200',
'000000578093',
'000000369751',
'000000429011',
'000000301061',
'000000105264',
'000000267434',
'000000370711',
'000000025393',
'000000471087',
'000000106757',
'000000183648',
'000000358525',
'000000049269',
'000000079144',
'000000519688',
'000000431727',
'000000130699',
'000000215245',
'000000091921',
'000000218424',
'000000473974',
'000000405249',
'000000235784',
'000000521540',
'000000537506',
'000000119445',
'000000507015',
'000000173830',
'000000356498',
'000000435081',
'000000018575',
'000000373315',
'000000227765',
'000000013546',
'000000067310',
'000000125936',
'000000389109',
'000000322211',
'000000184384',
'000000426329',
'000000128476',
'000000414034',
'000000450488',
'000000099182',
'000000051738',
'000000099039',
'000000075456',
'000000134882',
'000000442323',
'000000232489',
'000000351823',
'000000065736',
'000000001000',
'000000379842',
'000000013923',
'000000559543',
'000000185890',
'000000357978',
'000000129492',
'000000261097',
'000000410510',
'000000039951',
'000000306700',
'000000146457',
'000000214224',
'000000332845',
'000000255483',
'000000222455',
'000000187271',
'000000462629',
'000000544565',
'000000369771',
'000000035963',
'000000289516',
'000000334309',
'000000452084',
'000000301718',
'000000429598',
'000000165257',
'000000093437',
'000000413552',
'000000062025',
'000000017379',
'000000176778',
'000000104572',
'000000090108',
'000000157124',
'000000089556',
'000000266206',
'000000086220',
'000000508602',
...]
Transform annotation#
For the task-chanining or merging multiple heterogeneous datasets, we need to redefine the class definition. Datumaro provides this class redefinition through remap_labels
as below.
[6]:
mapping = {"motorcycle": "bicycle", "bus": "car", "truck": "car"}
remap_label_dataset = dataset.transform("remap_labels", mapping=mapping)
remap_label_dataset
[6]:
Dataset
size=123287
source_path=coco_dataset
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=122218
annotations_count=1018861
subsets
train2017: # of items=118287, # of annotated items=117266, # of annotations=976995, annotation types=['mask', 'polygon']
val2017: # of items=5000, # of annotated items=4952, # of annotations=41866, annotation types=['mask', 'polygon']
categories
label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
Split datasets#
From now on, we are going to give examples of extracting the subset of the imported dataset and splitting this into multiple subsets. Datumaro provides two types of splitter; one is the per-sample level random splitter from the given ratio of subsets and the other is the task-specific splitter under consideration of annotation instances.
We first extract the validation dataset and split this into multiple cross-validation datasets.
[7]:
# from datumaro.components.dataset import Dataset
val_dataset = dataset.filter(
'/item[subset="val2017"]'
) # or Dataset(dataset.get_subset(subsets[0]))
val_dataset
[7]:
Dataset
size=5000
source_path=coco_dataset
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=4952
annotations_count=41866
subsets
val2017: # of items=5000, # of annotated items=4952, # of annotations=41866, annotation types=['mask', 'polygon']
categories
label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
[8]:
splits = (("val1", 0.2), ("val2", 0.2), ("val3", 0.2), ("val4", 0.2), ("val5", 0.2))
crossval_dataset = val_dataset.transform("random_split", splits=splits)
crossval_dataset
[8]:
Dataset
size=5000
source_path=coco_dataset
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=4952
annotations_count=41866
subsets
val1: # of items=1000, # of annotated items=991, # of annotations=8344, annotation types=['mask', 'polygon']
val2: # of items=1000, # of annotated items=991, # of annotations=7646, annotation types=['mask', 'polygon']
val3: # of items=1000, # of annotated items=993, # of annotations=8625, annotation types=['mask', 'polygon']
val4: # of items=1000, # of annotated items=986, # of annotations=8752, annotation types=['mask', 'polygon']
val5: # of items=1000, # of annotated items=991, # of annotations=8499, annotation types=['mask', 'polygon']
categories
label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
Furthermore, Datumaro provides the split function in the viewpoint of annotation instead of sample through a task-specific splitter. By performing below, we can get the well-distributed validation datasets in terms of the number of annotations.
[9]:
import datumaro.plugins.splitter as splitter
task = splitter.SplitTask.segmentation.name
splits = [("val1", 0.2), ("val2", 0.2), ("val3", 0.2), ("val4", 0.2), ("val5", 0.2)]
crossval_per_ann_dataset = val_dataset.transform("split", task=task, splits=splits)
crossval_per_ann_dataset
[9]:
Dataset
size=5000
source_path=coco_dataset
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=4952
annotations_count=41866
subsets
val1: # of items=1000, # of annotated items=1000, # of annotations=8368, annotation types=['mask', 'polygon']
val2: # of items=967, # of annotated items=919, # of annotations=8374, annotation types=['mask', 'polygon']
val3: # of items=1032, # of annotated items=1032, # of annotations=8374, annotation types=['mask', 'polygon']
val4: # of items=987, # of annotated items=987, # of annotations=8376, annotation types=['mask', 'polygon']
val5: # of items=1014, # of annotated items=1014, # of annotations=8374, annotation types=['mask', 'polygon']
categories
label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
Lastly, we can rename the subset as below.
[10]:
mapping = {"val1": "train", "val2": "train", "val3": "train", "val4": "val", "val5": "test"}
test_dataset = dataset.transform("map_subsets", mapping=mapping)
test_dataset
[10]:
Dataset
size=5000
source_path=coco_dataset
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=4952
annotations_count=41866
subsets
test: # of items=1014, # of annotated items=1014, # of annotations=8374, annotation types=['mask', 'polygon']
train: # of items=2999, # of annotated items=2951, # of annotations=25116, annotation types=['mask', 'polygon']
val: # of items=987, # of annotated items=987, # of annotations=8376, annotation types=['mask', 'polygon']
categories
label: ['person', 'bicycle', 'car', 'airplane', 'train', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']