Fast Data Loading#

OpenVINO™ Training Extensions provides several ways to boost model training speed, one of which is fast data loading.

Faster Augmentation#

AugMix#

AugMix [1] is a simple yet powerful augmentation technique to improve robustness and uncertainty estimates of image classification task. OpenVINO™ Training Extensions implemented it in Cython for faster augmentation. Users do not need to configure anything as cythonized AugMix is used by default.

Caching#

In-Memory Caching#

OpenVINO™ Training Extensions provides in-memory caching for decoded images in main memory. If the batch size is large, such as for classification tasks, or if dataset contains high-resolution images, image decoding can account for a non-negligible overhead in data pre-processing. One can enable in-memory caching for maximizing GPU utilization and reducing model training time in those cases.

$ otx train --mem-cache-size=8GB ..

Storage Caching#

OpenVINO™ Training Extensions uses Datumaro under the hood for dataset managements. Since Datumaro supports Apache Arrow, OpenVINO™ Training Extensions can exploit fast data loading using memory-mapped arrow file at the expanse of storage consumtion.

$ otx train .. params --algo_backend.storage_cache_scheme JPEG/75

The cache would be saved in $HOME/.cache/otx by default. One could change it by modifying OTX_CACHE environment variable.

$ OTX_CACHE=/path/to/cache otx train .. params --algo_backend.storage_cache_scheme JPEG/75

Please refere Datumaro document for available schemes to choose but we recommend JPEG/75 for fast data loaidng.