Level 2: Dataset download#
Datumaro provides a way to download public datasets using TensorFlow Datasets download API. Using this feature, you can download some datasets in the catalog.
Prepare installation#
To use Datumaro download
feature, you should install Datumaro with [tf,tfds]
extras:
pip install datumaro[tf,tfds]
Note
You cannot use Datumaro download feature if you installed Datumaro with the default option, e.g., pip install datumaro
. Please check it!
Which datasets are available?#
You can see the list of available DATASET_ID
using the following command.
datum download describe [--report-format {text,json}] [--report-file REPORT_FILE]
How can we download datasets?#
You can actually download the dataset using the following command.
You have to input -i DATASET_ID
according to the id of dataset you want to download.
Additionally, you can specify the output format (-f OUTPUT_FORMAT
) and path (-o DST_DIR
).
datum download get [-h] -i DATASET_ID [-f OUTPUT_FORMAT] [-o DST_DIR] [--overwrite] [-s SUBSET] ...
Note
By default, download
does not export the media files (e.g. images).
We recommand you to run this command with --save-media
option to export the media files as well,
for example, datum download get -i tfds:mnist -- --save-media
.
In the next level, we will look into how to import and export the dataset using Datumaro!