Level 12: Framework Conversion#
Datumaro allows seamless conversion of datasets to popular deep learning frameworks, such as PyTorch and TensorFlow. This is particularly useful when you are working with a dataset that needs to be used across different frameworks without manual reformatting.
Datumaro provides the FrameworkConverter class, which can be used to convert a dataset for various tasks like classification, detection, and segmentation.
- Supported Tasks
Classification
Multilabel Classification
Detection
Instance Segmentation
Semantic Segmentation
Tabular Data
With the PyTorch framework, you can convert a Datumaro dataset like this:
from datumaro.plugins.framework_converter import FrameworkConverter
from torchvision import transforms
transform = transforms.Compose([transforms.ToTensor()])
dm_dataset = ... # Load your dataset here
First, we have to specify the dataset, subset, and task
multi_framework_dataset = FrameworkConverter(dm_dataset, subset="train", task="classification")
train_dataset = multi_framework_dataset.to_framework(framework="torch", transform=transform)
Through this, we convert the dataset to PyTorch format
from torch.utils.data import DataLoader
train_loader = DataLoader(train_dataset, batch_size=32)
Now we can use the train_dataset with PyTorch DataLoader
In this example:
subset=”train” indicates that we are working with the training portion of the dataset.
task=”classification” specifies that this is a classification task.
The dataset is converted to PyTorch-compatible format using the to_framework method.