Action Detection model#

This live example shows how to easily train and validate for spatio-temporal action detection model on the subset of JHMDB. To learn more about Action Detection task, refer to Action Detection.

Note

To learn deeper how to manage training process of the model including additional parameters and its modification, refer to Object Detection model.

The process has been tested on the following configuration.

  • Ubuntu 20.04

  • NVIDIA GeForce RTX 3090

  • Intel(R) Core(TM) i9-10980XE

  • CUDA Toolkit 11.1

Setup virtual environment#

1. You can follow the installation process from a quick start guide to create a universal virtual environment for OpenVINO™ Training Extensions.

2. Activate your virtual environment:

.otx/bin/activate
# or by this line, if you created an environment, using tox
. venv/otx/bin/activate

Dataset preparation#

For action detection task, you need to prepare dataset whose format is AVA dataset. For easy beginning, we provide sample dataset

If you download data from link and extract to training_extensions/data folder(you should make data folder at first), you can see the structure below:

training_extensions
└── data
    └── JHMDB_10%
        ├── annotations
        │    └── ava_action_list_v2.2.pbtxt
        │    └── ava_test.csv
        │    └── ava_train.csv
        │    └── ava_val.csv
        │    └── test.pkl
        │    └── train.pkl
        │    └── val.pkl
        │
        └── frames
            │── train_video001
            │   └── train_video001_0001.jpg
            └── test_video001
                └── test_video001_0001.jpg

Training#

1. First of all, you need to choose which action detection model you want to train. The list of supported recipes for action detection is available with the command line below:

Note

The characteristics and detailed comparison of the models could be found in Explanation section.

(otx) ...$ otx find --task ACTION_DETECTION

+-----------------------+--------------------------------------+---------------------------------------------------------------------------------+
|          TASK         |                  Model Name          |                                         Recipe PATH                             |
+-----------------------+--------------------------------------+---------------------------------------------------------------------------------+
| ACTION_DETECTION      | x3d_fast_rcnn                        | ../otx/recipe/action/action_detection/x3d_fast_rcnn.yaml                        |
+-----------------------+--------------------------------------+---------------------------------------------------------------------------------+

To have a specific example in this tutorial, all commands will be run on the X3D_FAST_RCNN model. It’s a light model, that achieves competitive accuracy while keeping the inference fast.

2. otx train trains a model (a particular model template) on a dataset and results:

Here are the main outputs can expect with CLI: - {work_dir}/{timestamp}/checkpoints/epoch_*.ckpt - a model checkpoint file. - {work_dir}/{timestamp}/configs.yaml - The configuration file used in the training can be reused to reproduce the training. - {work_dir}/.latest - The results of each of the most recently executed subcommands are soft-linked. This allows you to skip checkpoints and config file entry as a workspace.

(otx) ...$ otx train --data_root data/JHMDB_10%
(otx) ...$ otx train --config src/otx/recipe/action/action_detection/x3d_fast_rcnn.yaml --data_root data/JHMDB_10%
from otx.engine import Engine

data_root = "data/JHMDB_10%"
recipe = "src/otx/recipe/action/action_detection/x3d_fast_rcnn.yaml"

engine = Engine.from_config(
          config_path=recipe,
          data_root=data_root,
          work_dir="otx-workspace",
        )

engine.train(...)
from otx.engine import Engine

data_root = "data/JHMDB_10%"

engine = Engine(
          model="x3d",
          data_root=data_root,
          work_dir="otx-workspace",
        )

engine.train(...)

3. (Optional) Additionally, we can tune training parameters such as batch size, learning rate, patience epochs or warm-up iterations. Learn more about specific parameters using otx train --help -v or otx train --help -vv.

For example, to decrease the batch size to 4, fix the number of epochs to 100, extend the command line above with the following line.

(otx) ...$ otx train ... --data.config.train_subset.batch_size 4 \
                         --max_epochs 100
from otx.core.config.data import DataModuleConfig, SubsetConfig
from otx.core.data.module import OTXDataModule
from otx.engine import Engine

data_config = DataModuleConfig(..., train_subset=SubsetConfig(..., batch_size=4))
datamodule = OTXDataModule(..., config=data_config)

engine = Engine(..., datamodule=datamodule)

engine.train(max_epochs=100)

4. The training result checkpoints/*.ckpt file is located in {work_dir} folder, while training logs can be found in the {work_dir}/{timestamp} dir.

Note

We also can visualize the training using Tensorboard as these logs are located in {work_dir}/{timestamp}/tensorboard.

otx-workspace
├── 20240403_134256/
    ├── csv/
    ├── checkpoints/
    |   └── epoch_*.pth
    ├── tensorboard/
    └── configs.yaml
└── .latest
    └── train/
...

The training time highly relies on the hardware characteristics, for example on 1 NVIDIA GeForce RTX 3090 the training took about 3 minutes.

After that, we have the PyTorch object detection model trained with OpenVINO™ Training Extensions, which we can use for evaluation, export, optimization and deployment.

Evaluation#

1. otx test runs evaluation of a trained model on a particular dataset.

Test function receives test annotation information and model snapshot, trained in previous step.

The default metric is mAP_50 measure.

2. That’s how we can evaluate the snapshot in otx-workspace folder on JHMDB_10% dataset and save results to otx-workspace:

(otx) ...$ otx test --work_dir otx-workspace
  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
          Test metric               DataLoader 0          ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
        test/data_time          0.006367621477693319            test/iter_time           0.02698644995689392               test/map              0.10247182101011276              test/map_50            0.3779516816139221               test/map_75            0.03639398142695427            test/map_large           0.11831618845462799            test/map_medium          0.02958027645945549          test/map_per_class                -1.0                    test/map_small                   0.0                      test/mar_1             0.12753313779830933              test/mar_10            0.1305265873670578              test/mar_100            0.1305265873670578         test/mar_100_per_class              -1.0                    test/mar_large           0.14978596568107605            test/mar_medium          0.06217033043503761            test/mar_small                   0.0              └───────────────────────────┴───────────────────────────┘
(otx) ...$ otx test --config  src/otx/recipe/action/action_detection/x3d_fast_rcnn.yaml \
                    --data_root data/JHMDB_10% \
                    --checkpoint otx-workspace/20240312_051135/checkpoints/epoch_033.ckpt
  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
          Test metric               DataLoader 0          ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
        test/data_time          0.006367621477693319            test/iter_time           0.02698644995689392               test/map              0.10247182101011276              test/map_50            0.3779516816139221               test/map_75            0.03639398142695427            test/map_large           0.11831618845462799            test/map_medium          0.02958027645945549          test/map_per_class                -1.0                    test/map_small                   0.0                      test/mar_1             0.12753313779830933              test/mar_10            0.1305265873670578              test/mar_100            0.1305265873670578         test/mar_100_per_class              -1.0                    test/mar_large           0.14978596568107605            test/mar_medium          0.06217033043503761            test/mar_small                   0.0              └───────────────────────────┴───────────────────────────┘
engine.test()

3. The output of {work_dir}/{timestamp}/csv/version_0/metrics.csv consists of a dict with target metric name and its value.