Action Detection model#

This live example shows how to easily train and validate for spatio-temporal action detection model on the subset of JHMDB. To learn more about Action Detection task, refer to Action Detection.

Note

To learn deeper how to manage training process of the model including additional parameters and its modification, refer to Object Detection model.

The process has been tested on the following configuration.

Ubuntu 20.04
NVIDIA GeForce RTX 3090
Intel(R) Core(TM) i9-10980XE
CUDA Toolkit 11.1

Setup virtual environment#

1. You can follow the installation process from a quick start guide to create a universal virtual environment for OpenVINO™ Training Extensions.

2. Activate your virtual environment:

.otx/bin/activate
# or by this line, if you created an environment, using tox
. venv/otx/bin/activate

Dataset preparation#

Although we offer conversion codes from ava dataset format to cvat dataset format from this code, for easy beginning you can download subset of JHMDB dataset, which already transformed to CVAT format from this link.

If you download data from link and extract to training_extensions/data folder(you should make data folder at first), you can see the structure below:

training_extensions
└── data
    └── JHMDB_5%
        ├── train
        │    └── brush_hair_Brushing_Hair_with_Beth_brush_hair_h_nm_np1_le_goo_0
        │        ├── annotations.xml
        │        └── images [40 frames]
        │
        │── test
        │    └── brush_hair_Aussie_Brunette_Brushing_Long_Hair_brush_hair_u_nm_np1_fr_med_0
        │        ├── annotations.xml
        │        └── images [40 frames]
        │
        │── train.pkl
        └── test.pkl

Training#

1. First of all, you need to choose which action detection model you want to train. The list of supported templates for action detection is available with the command line below:

Note

The characteristics and detailed comparison of the models could be found in Explanation section.

(otx) ...$ otx find --task action_detection

+------------------+---------------------------------------+---------------+-------------------------------------------------------------------------+
|       TASK       |                   ID                  |      NAME     |                                BASE PATH                                |
+------------------+---------------------------------------+---------------+-------------------------------------------------------------------------+
| ACTION_DETECTION | Custom_Action_Detection_X3D_FAST_RCNN | X3D_FAST_RCNN | src/otx/algorithms/action/configs/detection/x3d_fast_rcnn/template.yaml |
+------------------+---------------------------------------+---------------+-------------------------------------------------------------------------+

To have a specific example in this tutorial, all commands will be run on the X3D_FAST_RCNN model. It’s a light model, that achieves competitive accuracy while keeping the inference fast.

2. Next, we need to create workspace for various tasks we provide.

Let’s prepare an OpenVINO™ Training Extensions action detection workspace running the following command:

(otx) ...$ otx build x3d_fast_rcnn --train-data-roots ./data/JHMDB_5%/train --val-data-roots ./data/JHMDB_5%/test

[*] Workspace Path: otx-workspace-ACTION_DETECTION
[*] Load Model Template ID: Custom_Action_Detection_X3D_FAST_RCNN
[*] Load Model Name: X3D_FAST_RCNN
[*]     - Updated: otx-workspace-ACTION_DETECTION/model.py
[*]     - Updated: otx-workspace-ACTION_DETECTION/data_pipeline.py
[*] Update data configuration file to: otx-workspace-ACTION_DETECTION/data.yaml

(otx) ...$ cd ./otx-workspace-ACTION_DETECTION

It will create otx-workspace-ACTION_DETECTION with all necessary configs for X3D_FAST_RCNN, prepared data.yaml to simplify CLI commands launch and splitted dataset.

3. To start training we need to call otx train command in our workspace:

(otx) ...$ otx train

That’s it! The training will return artifacts: weights.pth and label_schema.json, which are needed as input for the further commands: export, eval, optimize, etc.

The training time highly relies on the hardware characteristics, for example on 1 NVIDIA GeForce RTX 3090 the training took about 70 minutes.

After that, we have the PyTorch action detection model trained with OpenVINO™ Training Extensions.

Validation#

1. otx eval runs evaluation of a trained model on a specific dataset.

The eval function receives test annotation information and model snapshot, trained in the previous step. Please note, label_schema.json file contains meta information about the dataset and it should be located in the same folder as the model snapshot.

otx eval will output a mAP score for spatio-temporal action detection.

2. The command below will run validation on our dataset and save performance results in outputs/performance.json file:

(otx) ...$ otx eval --test-data-roots ../data/JHMDB_5%/test \
                    --load-weights models/weights.pth \
                    --output outputs

We will get a similar to this validation output after some validation time (about 2 minutes):

2023-02-21 22:42:14,540 - mmaction - INFO - Loaded model weights from Task Environment
2023-02-21 22:42:14,540 - mmaction - INFO - Model architecture: X3D_FAST_RCNN
2023-02-21 22:42:14,739 - mmaction - INFO - Patching pre proposals...
2023-02-21 22:42:14,749 - mmaction - INFO - Done.
2023-02-21 22:44:24,345 - mmaction - INFO - Inference completed
2023-02-21 22:44:24,347 - mmaction - INFO - called evaluate()
2023-02-21 22:44:26,349 - mmaction - INFO - Final model performance: Performance(score: 0.5086285195277019, dashboard: (1 metric groups))
2023-02-21 22:44:26,349 - mmaction - INFO - Evaluation completed
Performance(score: 0.5086285195277019, dashboard: (1 metric groups))

Note

Currently we don’t support export and optimize task in action detection. We will support these features very near future.

Export#

1. otx export exports a trained Pytorch .pth model to the OpenVINO™ Intermediate Representation (IR) format. It allows running the model on the Intel hardware much more efficiently, especially on the CPU. Also, the resulting IR model is required to run PTQ optimization. IR model consists of two files: openvino.xml for weights and openvino.bin for architecture.

2. Run the command line below to export the trained model and save the exported model to the openvino folder.

(otx) ...$ otx export

2023-03-24 15:03:35,993 - mmdeploy - INFO - Export PyTorch model to ONNX: /tmp/OTX-task-ffw8llin/openvino.onnx.
2023-03-24 15:03:44,450 - mmdeploy - INFO - Args for Model Optimizer: mo --input_model="/tmp/OTX-task-ffw8llin/openvino.onnx" --output_dir="/tmp/OTX-task-ffw8llin/" --output="bboxes,labels" --input="input" --input_shape="[1, 3, 32, 256, 256]" --mean_values="[123.675, 116.28, 103.53]" --scale_values="[58.395, 57.12, 57.375]" --source_layout=bctwh
2023-03-24 15:03:46,707 - mmdeploy - INFO - [ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/OTX-task-ffw8llin/openvino.xml
[ SUCCESS ] BIN file: /tmp/OTX-task-ffw8llin/openvino.bin

2023-03-24 15:03:46,707 - mmdeploy - INFO - Successfully exported OpenVINO model: /tmp/OTX-task-ffw8llin/openvino.xml 2023-03-24 15:03:46,756 - mmaction - INFO - Exporting completed

3. Check the accuracy of the IR model and the consistency between the exported model and the PyTorch model, using otx eval and passing the IR model path to the --load-weights parameter.

(otx) ...$ otx eval --test-data-roots ../data/JHMDB_5%/test \
                    --load-weights model-exported/openvino.xml \
                    --save-performance model-exported/performance.json

...

Performance(score: 0.47351524879614754, dashboard: (3 metric groups))

Optimization#

1. You can further optimize the model with otx optimize. Currently, only PTQ is supported for action detection. NNCF will be supported in near future.

The optimized model will be quantized to INT8 format. Refer to optimization explanation section for more details on model optimization.

2. Example command for optimizing OpenVINO™ model (.xml) with OpenVINO™ PTQ.

(otx) ...$ otx optimize --load-weights openvino/openvino.xml \
                        --save-model-to ptq_model

...

[*] Update data configuration file to: data.yaml
Statistics collection: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [04:16<00:00,  1.17it/s]Biases correction: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 168/168 [00:15<00:00, 10.63it/s][>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1572/1572, 7.3 task/s, elapsed: 216s, ETA:     0s
Performance(score: 0.4621155288822204, dashboard: (1 metric groups))

Keep in mind that PTQ will take some time (generally less than NNCF optimization) without logging to optimize the model.

3. Now, you have fully trained, optimized and exported an efficient model representation ready-to-use action detection model.