Data Framework Convert#

In this notebook, we will demonstrate how to leverage the Datumaro to manage datasets and seamlessly integrate them into a PyTorch training pipeline. This tutorial will walk through preparing a dataset using Datumaro and converting it into a format suitable for PyTorch model training and validation.

Specifically, we will:

  • Load and inspect a dataset using Datumaro.

  • Convert the dataset to a PyTorch-friendly format.

  • Implement a simple training and validation pipeline using PyTorch.

By the end of this notebook, you will understand how Datumaro can simplify dataset management tasks and improve the efficiency of your deep learning pipelines.

Prerequisite#

Download dataset#

We will be using a dataset from Kaggle for this tutorial. First, we’ll download the dataset. Please refer to this guide on how to download datasets from Kaggle.

In this notebook, we choose ananthu017/emotion-detection-fer dataset as below.

[2]:
# !kaggle datasets download ananthu017/emotion-detection-fer --unzip --path ./emotion-detection-fer

Dataset Preparation#

Import a dataset#

The dataset is organized in the following directory structure:

.
├── test
│   ├── angry
│   ├── disgusted
│   ├── fearful
│   ├── happy
│   ├── neutral
│   ├── sad
│   └── surprised
└── train
    ├── angry
    ├── disgusted
    ├── fearful
    ├── happy
    ├── neutral
    ├── sad
    └── surprised

In our emotion_detection_fer folder, the dataset is divided into two main directories: train and test. Each of these directories contains subfolders for each emotion category, including “angry,” “disgusted,” “fearful,” “happy,” “neutral,” “sad,” and “surprised.” Each subfolder contains images corresponding to that emotion, allowing for organized access during training and testing phases. I used datumaro to inspect the dataset directory structure, and it appears that the dataset is well-structured for a classification task.

[3]:
import datumaro as dm

dataset_dir = "/home/sooah/data/emotion-detection-fer"
formats = dm.Dataset.detect(dataset_dir)
print(f"Detected data format is '{formats}'")

dataset = dm.Dataset.import_from(dataset_dir, formats)
print(dataset)
Detected data format is 'imagenet_with_subset_dirs'
Dataset
        size=35887
        source_path=/home/sooah/data/emotion-detection-fer
        media_type=<class 'datumaro.components.media.Image'>
        ann_types={<AnnotationType.label: 1>}
        annotated_items_count=35887
        annotations_count=35887
subsets
        test: # of items=7178, # of annotated items=7178, # of annotations=7178
        train: # of items=28709, # of annotated items=28709, # of annotations=28709
infos
        categories
        1: ['angry', 'disgusted', 'fearful', 'happy', 'neutral', 'sad', 'surprised']

Based on the information provided:

  • The total size of the dataset is 35,887 items.

  • The dataset is divided into two subsets:

    • The ‘test’ subset contains 7,178 items.

    • The ‘train’ subset contains 28,709 items.

This breakdown gives us insight into the scale of our dataset and the distribution of items across its subsets, with a clear emphasis on a larger training set to enhance model performance.

Convert Datumaro dataset into PyTorch dataset#

The process of converting a Datumaro dataset into a PyTorch dataset involves utilizing the FrameworkConverter from the Datumaro library. This allows us to seamlessly transform our dataset for compatibility with PyTorch’s training and validation pipeline. In the code, we first define a set of transformations using torchvision.transforms, specifically converting images to tensor format. We then create PyTorch-compatible datasets for both the training and testing subsets by specifying the respective subset names and the classification task. Finally, we can check the number of items in both datasets to ensure they have been correctly prepared for model training and evaluation. This approach not only streamlines the data preprocessing step but also leverages the robust capabilities of the PyTorch framework for building and deploying deep learning models.

[4]:
from torchvision import transforms
from datumaro.plugins.framework_converter import FrameworkConverter

transform = transforms.Compose([transforms.ToTensor()])

multi_framework_dataset = FrameworkConverter(dataset, subset="train", task="classification")
train_dataset = multi_framework_dataset.to_framework(
    framework="torch",
    transform=transform,
)

multi_framework_dataset = FrameworkConverter(dataset, subset="test", task="classification")
val_dataset = multi_framework_dataset.to_framework(
    framework="torch",
    transform=transform,
)

print(f"Converted train dataset len is '{len(train_dataset)}'")
print(f"Converted train dataset len is '{len(val_dataset)}'")
2024-10-23 15:32:28.371272: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-23 15:32:28.383616: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-23 15:32:28.387695: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-23 15:32:28.396903: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-23 15:32:29.470910: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Converted train dataset len is '28709'
Converted train dataset len is '7178'

Building the PyTorch Training and Validation Pipeline#

Creating Data Loaders for Efficient Data Handling#

In this section, we establish our data loaders for both training and validation datasets, which are essential for efficient data handling during the model training process. By utilizing PyTorch’s DataLoader, we ensure that our training data is shuffled randomly for better generalization, while the validation data is loaded in a deterministic manner to facilitate accurate performance evaluation. The specified batch size of 4 allows for manageable processing of data during each training iteration. With these loaders in place, we can seamlessly feed our datasets into the training loop for effective model training and validation.

[5]:
from torch.utils.data import DataLoader

training_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
validation_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

print(f"Training Loader Batches: {len(training_loader)}")
print(f"Validation Loader Batches: {len(validation_loader)}")
Training Loader Batches: 449
Validation Loader Batches: 113

Modeling#

Model Architecture Definition#

In this section, we define our model architecture by leveraging the pre-trained ResNet-50 model, which is well-suited for image classification tasks. By utilizing transfer learning, we can capitalize on the learned features from the ImageNet dataset, which enhances our model’s performance on the emotion detection task. We modify the final fully connected layer to match the number of classes in our specific dataset, ensuring the model outputs predictions relevant to the emotions present in the images. Finally, we transfer the model to the GPU, enabling efficient training and inference processes. This approach helps us build a robust foundation for our emotion detection pipeline.

[6]:
from torchvision.models import mobilenet_v2
import torch

model = mobilenet_v2(weights="IMAGENET1K_V1")
model.features[0] = torch.nn.Conv2d(
    1, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
)
# Get the number of input features for the last layer
num_features = model.classifier[1].in_features

# Create a new classifier layer with the number of classes
num_classes = len(dataset.categories()[dm.AnnotationType.label])
model.classifier[1] = torch.nn.Linear(num_features, num_classes)

# Move the model to GPU if available
model = model.cuda()  # If using GPU

Training and Validation Loop#

In this section, we implement the training and validation loop for our emotion detection model. The top_k_accuracy function calculates the top-k accuracy for the model predictions, allowing us to evaluate performance more robustly. We define a cross-entropy loss function suitable for multi-class classification tasks and use the Stochastic Gradient Descent (SGD) optimizer to adjust the model’s parameters. Throughout the training process, we report the loss for every 100 batches, providing insights into the model’s learning progress. After each epoch, we evaluate the model on the validation dataset, calculating the average accuracy to gauge its effectiveness in classifying the emotions.

[7]:
def top_k_accuracy(output, labels, k=1):
    """Compute the top-k accuracy given model output and labels."""
    with torch.no_grad():
        batch_size = labels.size(0)
        _, pred = output.topk(k, 1, True, True)
        pred = pred.t()
        correct = pred.eq(labels.view(1, -1).expand_as(pred))
        correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
        return correct_k.mul_(100.0 / batch_size).item()


# Define loss function and optimizer
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

EPOCHS = 10
for epoch in range(EPOCHS):
    print(f"EPOCH {epoch + 1}:")

    # Training phase
    model.train()
    running_loss = 0.0
    for i, data in enumerate(training_loader):
        inputs, labels = data
        inputs, labels = inputs.cuda(), labels.cuda()

        optimizer.zero_grad()
        outputs = model(inputs)

        loss = loss_fn(outputs, labels)
        loss.backward()
        optimizer.step()

        # Gather data and report
        running_loss += loss.item()
        if (i + 1) % 100 == 0:
            print(f"\t [TRAIN] batch {i + 1} loss: {running_loss / 100:.4f}")
            running_loss = 0.0

    # Validation phase
    model.eval()
    accs = 0.0
    with torch.no_grad():
        for i, vdata in enumerate(validation_loader):
            inputs, labels = vdata
            inputs, labels = inputs.cuda(), labels.cuda()

            outputs = model(inputs)
            top1_acc = top_k_accuracy(outputs, labels, k=1)
            accs += top1_acc

    avg_accs = accs / (i + 1)
    print(f"\t [VAL] validation accuracy: {avg_accs:.2f}%")
EPOCH 1:
         [TRAIN] batch 100 loss: 1.8547
         [TRAIN] batch 200 loss: 1.7383
         [TRAIN] batch 300 loss: 1.6348
         [TRAIN] batch 400 loss: 1.5933
         [VAL] validation accuracy: 40.84%
EPOCH 2:
         [TRAIN] batch 100 loss: 1.4857
         [TRAIN] batch 200 loss: 1.4612
         [TRAIN] batch 300 loss: 1.4012
         [TRAIN] batch 400 loss: 1.3967
         [VAL] validation accuracy: 48.21%
EPOCH 3:
         [TRAIN] batch 100 loss: 1.2735
         [TRAIN] batch 200 loss: 1.2806
         [TRAIN] batch 300 loss: 1.2650
         [TRAIN] batch 400 loss: 1.2792
         [VAL] validation accuracy: 51.14%
EPOCH 4:
         [TRAIN] batch 100 loss: 1.1394
         [TRAIN] batch 200 loss: 1.1445
         [TRAIN] batch 300 loss: 1.1760
         [TRAIN] batch 400 loss: 1.1557
         [VAL] validation accuracy: 52.51%
EPOCH 5:
         [TRAIN] batch 100 loss: 1.0302
         [TRAIN] batch 200 loss: 1.0563
         [TRAIN] batch 300 loss: 1.0757
         [TRAIN] batch 400 loss: 1.0815
         [VAL] validation accuracy: 52.39%
EPOCH 6:
         [TRAIN] batch 100 loss: 0.9378
         [TRAIN] batch 200 loss: 0.9302
         [TRAIN] batch 300 loss: 1.0006
         [TRAIN] batch 400 loss: 0.9811
         [VAL] validation accuracy: 51.48%
EPOCH 7:
         [TRAIN] batch 100 loss: 0.8105
         [TRAIN] batch 200 loss: 0.8475
         [TRAIN] batch 300 loss: 0.9001
         [TRAIN] batch 400 loss: 0.9265
         [VAL] validation accuracy: 54.57%
EPOCH 8:
         [TRAIN] batch 100 loss: 0.7378
         [TRAIN] batch 200 loss: 0.7624
         [TRAIN] batch 300 loss: 0.8293
         [TRAIN] batch 400 loss: 0.8538
         [VAL] validation accuracy: 54.17%
EPOCH 9:
         [TRAIN] batch 100 loss: 0.6630
         [TRAIN] batch 200 loss: 0.6890
         [TRAIN] batch 300 loss: 0.7210
         [TRAIN] batch 400 loss: 0.7865
         [VAL] validation accuracy: 52.42%
EPOCH 10:
         [TRAIN] batch 100 loss: 0.5968
         [TRAIN] batch 200 loss: 0.6473
         [TRAIN] batch 300 loss: 0.6737
         [TRAIN] batch 400 loss: 0.7167
         [VAL] validation accuracy: 55.29%

Model Fine-Tuning and Further Improvements#

While MobileNetV2 provided a solid baseline performance for this emotion detection task, further fine-tuning can help improve results. Experimenting with different architectures—such as ResNet or EfficientNet—or adjusting layers and hyperparameters in MobileNetV2 could yield a better fit to the unique characteristics of the dataset. Additionally, applying transfer learning from models pretrained on large face or emotion recognition datasets might enhance the model’s ability to capture subtle facial expressions, leading to higher accuracy in emotion detection.

Conclusion#

In this notebook, we explored the use of Datumaro for data management, transforming the emotion-detection-fer dataset into a PyTorch-compatible format. This process enabled us to easily handle image-based datasets, including various pre-processing steps and dataset partitioning for training and validation.

Leveraging MobileNetV2, a lightweight yet effective model architecture, we demonstrated its application for facial emotion recognition. MobileNetV2, with its efficient design and lower computational requirements, performed well on the dataset, making it a practical choice for projects that prioritize speed and model efficiency.

Through the completed training and validation pipeline, we showcased how MobileNetV2 can be fine-tuned for specific emotion detection tasks. Datumaro’s robust data management features allowed us to streamline the dataset preparation, ensuring efficient handling and compatibility with PyTorch.

Future improvements could involve experimenting with data augmentation, testing more complex model architectures, or further tuning hyperparameters to optimize accuracy. We hope this notebook serves as a comprehensive guide for leveraging Datumaro and MobileNetV2 in similar emotion detection or classification tasks.