Fast Data Loading
=================
OpenVINO™ Training Extensions provides several ways to boost model training speed,
one of which is fast data loading.
===================
Faster Augmentation
===================
******
AugMix
******
AugMix [1]_ is a simple yet powerful augmentation technique
to improve robustness and uncertainty estimates of image classification task.
OpenVINO™ Training Extensions implemented it in `Cython `_ for faster augmentation.
Users do not need to configure anything as cythonized AugMix is used by default.
=======
Caching
=======
*****************
In-Memory Caching
*****************
OpenVINO™ Training Extensions provides in-memory caching for decoded images in main memory.
If the batch size is large, such as for classification tasks, or if dataset contains
high-resolution images, image decoding can account for a non-negligible overhead
in data pre-processing.
One can enable in-memory caching for maximizing GPU utilization and reducing model
training time in those cases.
.. code-block::
$ otx train --mem-cache-size=8GB ..
***************
Storage Caching
***************
OpenVINO™ Training Extensions uses `Datumaro `_
under the hood for dataset managements.
Since Datumaro `supports `_
`Apache Arrow `_, OpenVINO™ Training Extensions
can exploit fast data loading using memory-mapped arrow file at the expanse of storage consumtion.
.. code-block::
$ otx train .. params --algo_backend.storage_cache_scheme JPEG/75
The cache would be saved in ``$HOME/.cache/otx`` by default.
One could change it by modifying ``OTX_CACHE`` environment variable.
.. code-block::
$ OTX_CACHE=/path/to/cache otx train .. params --algo_backend.storage_cache_scheme JPEG/75
Please refere `Datumaro document `_
for available schemes to choose but we recommend ``JPEG/75`` for fast data loaidng.
.. [1] Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. "AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty" International Conference on Learning Representations. 2020.