Multi-class Classification#

Multi-class classification is the problem of classifying instances into one of two or more classes. We solve this problem in a common fashion, based on the feature extractor backbone and classifier head that predicts the distribution probability of the categories from the given corpus. For the supervised training we use the following algorithms components:

  • Learning rate schedule: ReduceLROnPlateau. It is a common learning rate scheduler that tends to work well on average for this task on a variety of different datasets.

  • Loss function: We use standard Cross Entropy Loss to train a model. However, for the class-incremental scenario we use Influence-Balanced Loss. IB loss is a solution for the class imbalance, which avoids overfitting to the majority classes re-weighting the influential samples.

  • Additional training techniques
    • Early stopping: To add adaptability to the training pipeline and prevent overfitting.

    • Balanced Sampler: To create an efficient batch that consists of balanced samples over classes, reducing the iteration size as well.

Dataset Format#

We support a commonly used format for multi-class image classification task: ImageNet class folder format. This format has the following structure:

data
├── train
    ├── class 0
        ├── 0.png
        ├── 1.png
        ...
        └── N.png
    ├── class 1
        ├── 0.png
        ├── 1.png
        ...
        └── N.png
    ...
    └── class N
        ├── 0.png
        ├── 1.png
        ...
        └── N.png
└── val
    ...

Note

Please, refer to our dedicated tutorial for more information how to train, validate and optimize classification models.

Models#

We support the following ready-to-use model recipes:

Model Name

Complexity (GFLOPs)

Model params (M)

MobileNet-V3-large

0.86

2.97

MobileNet-V3-small

0.22

0.93

EfficinetNet-B0

1.52

4.09

EfficientNet-B3

3.84

10.70

EfficientNet-V2-S

5.76

20.23

EfficientNet-V2-l

48.92

117.23

DeiT-Tiny

2.51

22.0

DINO-V2

12.46

88.0

MobileNet-V3 is the best choice when training time and computational cost are in priority, nevertheless, this recipe provides competitive accuracy as well. EfficientNet-B0/B3 consumes more Flops compared to MobileNet, providing better performance on large datasets. EfficientNet-V2 has more parameters and Flops and needs more time to train, meanwhile providing superior classification performance. DeiT-Tiny is a transformer-based model that provides a good trade-off between accuracy and computational cost. DINO-V2 produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks.

To see which models are available for the task, the following command can be executed:

(otx) ...$ otx find --task MULTI_CLASS_CLS