Use Self-Supervised Learning#

This tutorial introduces how to train a model using self-supervised learning and how to fine-tune the model with pre-trained weights. OpenVINO™ Training Extensions provides self-supervised learning methods for multi-classification and semantic segmentation.

The process has been tested on the following configuration:

  • Ubuntu 20.04

  • NVIDIA GeForce RTX 3090

  • Intel(R) Core(TM) i9-10980XE

  • CUDA Toolkit 11.7

Note

This example demonstrates how to work with self-supervised learning for classification. There are some differences between classfication and semantic segmentation, so there will be some notes for self-supervised learning for semantic segmentation.

Setup virtual environment#

1. You can follow the installation process from a quick start guide to create a universal virtual environment for OpenVINO™ Training Extensions.

2. Activate your virtual environment:

.otx/bin/activate
# or by this line, if you created an environment, using tox
. venv/otx/bin/activate

Pre-training#

1. In this self-supervised learning tutorial, images from flowers dataset and MobileNet-V3-large-1x model is utilized.

2. Prepare OpenVINO™ Training Extensions workspace for supervised learning by running the following command:

(otx) ...$ otx build --train-data-roots data/flower_photos --model MobileNet-V3-large-1x

[*] Workspace Path: otx-workspace-CLASSIFICATION
[*] Load Model Template ID: Custom_Image_Classification_MobileNet-V3-large-1x
[*] Load Model Name: MobileNet-V3-large-1x
[*]     - Updated: otx-workspace-CLASSIFICATION/model.py
[*]     - Updated: otx-workspace-CLASSIFICATION/data_pipeline.py
[*]     - Updated: otx-workspace-CLASSIFICATION/deployment.py
[*]     - Updated: otx-workspace-CLASSIFICATION/hpo_config.yaml
[*]     - Updated: otx-workspace-CLASSIFICATION/model_hierarchical.py
[*]     - Updated: otx-workspace-CLASSIFICATION/model_multilabel.py
[*]     - Updated: otx-workspace-CLASSIFICATION/compression_config.json
[*] Update data configuration file to: otx-workspace-CLASSIFICATION/data.yaml

3. Prepare an OpenVINO™ Training Extensions workspace for self-supervised learning by running the following command:

(otx) ...$ otx build --train-data-roots data/flower_photos --model MobileNet-V3-large-1x --train-type Selfsupervised --workspace otx-workspace-CLASSIFICATION-Selfsupervised

[*] Workspace Path: otx-workspace-CLASSIFICATION-Selfsupervised
[*] Load Model Template ID: Custom_Image_Classification_MobileNet-V3-large-1x
[*] Load Model Name: MobileNet-V3-large-1x[*]     - Updated: otx-workspace-CLASSIFICATION-Selfsupervised/selfsl/model.py
[*]     - Updated: otx-workspace-CLASSIFICATION-Selfsupervised/selfsl/data_pipeline.py
[*]     - Updated: otx-workspace-CLASSIFICATION-Selfsupervised/deployment.py
[*]     - Updated: otx-workspace-CLASSIFICATION-Selfsupervised/hpo_config.yaml
[*]     - Updated: otx-workspace-CLASSIFICATION-Selfsupervised/model_hierarchical.py
[*]     - Updated: otx-workspace-CLASSIFICATION-Selfsupervised/model_multilabel.py
[*] Update data configuration file to: otx-workspace-CLASSIFICATION-Selfsupervised/data.yaml

Note

One important thing must be considered to set the workspace for self-supervised learning:

1. It is also possible to pass just a directory with any images to --train-data-roots then --train-type Selfsupervised is not needed. OpenVINO™ Training Extensions will recognize this training type automatically. However, if you passed a full imagenet data format (with different sub-folders inside), this option is mandatory since it is hard to distinguish between supervised training.

After the workspace creation, the workspace structure is as follows:

otx-workspace-CLASSIFICATION
├── compression_config.json
├── configuration.yaml
├── data_pipeline.py
├── data.yaml
├── deployment.py
├── hpo_config.yaml
├── model_hierarchical.py
├── model_multilabel.py
├── model.py
├── splitted_dataset
│   ├── train
│   └── val
└── template.yaml
otx-workspace-CLASSIFICATION-Selfsupervised
├── configuration.yaml
├── data.yaml
├── deployment.py
├── hpo_config.yaml
├── model_hierarchical.py
├── model_multilabel.py
├── selfsl
│   ├── data_pipeline.py
│   └── model.py
└── template.yaml

Note

For semantic segmentation, --train-data-root must be set to a directory including only images, like below.

For VOC2012 dataset used in semantic segmentation tutorial, for example, the path data/VOCdevkit/VOC2012/JPEGImages must be set instead of data/VOCdevkit/VOC2012.

Please refer to Explanation of Self-Supervised Learning for Semantic Segmentation. Option --train-type is not needed.

(otx) ...$ otx build --train-data-roots data/VOCdevkit/VOC2012/JPEGImages \
                    --model Lite-HRNet-18-mod2

4. To start training we need to call otx train command in self-supervised learning workspace:

(otx) ...$ cd otx-workspace-CLASSIFICATION-Selfsupervised
(otx) ...$ otx train --data ../otx-workspace-CLASSIFICATION/data.yaml

...

2023-02-23 19:41:36,879 | INFO : Iter [4970/5000]       lr: 8.768e-05, eta: 0:00:29, time: 1.128, data_time: 0.963, memory: 7522, current_iters: 4969, loss: 0.2788
2023-02-23 19:41:46,371 | INFO : Iter [4980/5000]       lr: 6.458e-05, eta: 0:00:19, time: 0.949, data_time: 0.782, memory: 7522, current_iters: 4979, loss: 0.2666
2023-02-23 19:41:55,806 | INFO : Iter [4990/5000]       lr: 5.037e-05, eta: 0:00:09, time: 0.943, data_time: 0.777, memory: 7522, current_iters: 4989, loss: 0.2793
2023-02-23 19:42:05,105 | INFO : Saving checkpoint at 5000 iterations
2023-02-23 19:42:05,107 | INFO : ----------------- BYOL.state_dict_hook() called
2023-02-23 19:42:05,314 | WARNING : training progress 100%
2023-02-23 19:42:05,315 | INFO : Iter [5000/5000]       lr: 4.504e-05, eta: 0:00:00, time: 0.951, data_time: 0.764, memory: 7522, current_iters: 4999, loss: 0.2787
2023-02-23 19:42:05,319 | INFO : run task done.
2023-02-23 19:42:05,323 | INFO : called save_model
2023-02-23 19:42:05,498 | INFO : Final model performance: Performance(score: -1, dashboard: (6 metric groups))
2023-02-23 19:42:05,499 | INFO : train done.
[*] Save Model to: models

Note

To use the same splitted train dataset, set --data ../otx-workspace-CLASSIFICATION/data.yaml insead of using data.yaml in self-supervised learning workspace.

The training will return artifacts: weights.pth and label_schema.json and we can use the weights to fine-tune the model using the target dataset. The final model performance will be set to -1, but it doesn’t matter because self-supervised learning doesn’t use accuracy. Let’s see how to fine-tune the model using pre-trained weights below.

Fine-tuning#

After pre-training progress, start fine-tuning by calling the below command with adding --load-weights argument in supervised learning workspace.

(otx) ...$ cd ../otx-workspace-CLASSIFICATION
(otx) ...$ otx train --load-weights ../otx-workspace-CLASSIFICATION-Selfsupervised/models/weights.pth

...

2023-02-23 20:56:24,307 | INFO : run task done.
2023-02-23 20:56:28,883 | INFO : called evaluate()
2023-02-23 20:56:28,895 | INFO : Accuracy after evaluation: 0.9604904632152589
2023-02-23 20:56:28,896 | INFO : Evaluation completed
Performance(score: 0.9604904632152589, dashboard: (3 metric groups))

For comparison, we can also obtain the performance without pre-trained weights as below:

(otx) ...$ otx train

...

2023-02-23 18:24:34,453 | INFO : run task done.
2023-02-23 18:24:39,043 | INFO : called evaluate()
2023-02-23 18:24:39,056 | INFO : Accuracy after evaluation: 0.9550408719346049
2023-02-23 18:24:39,056 | INFO : Evaluation completed
Performance(score: 0.9550408719346049, dashboard: (3 metric groups))

With self-supervised learning, we can obtain well-adaptive weights and train the model more accurately. This example showed a little improvement (0.955 → 0.960), but if we use only a few samples that are too difficult to train a model on, then self-supervised learning can be the solution to improve the model perfomance more significantly. You can check performance improvement examples in self-supervised learning for classification documentation.

Note

Then we obtain the new model after fine-tuning, we can proceed with optimization and exporting as described in classification tutorial.