MVTec Dataset¶
MVTec AD Dataset (CC BY-NC-SA 4.0).
- Description:
- This script contains PyTorch Dataset, Dataloader and PyTorch
Lightning DataModule for the MVTec AD dataset.
- If the dataset is not on the file system, the script downloads and
extracts the dataset and create PyTorch data objects.
- License:
MVTec AD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)(https://creativecommons.org/licenses/by-nc-sa/4.0/).
- Reference:
Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, Carsten Steger: The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: International Journal of Computer Vision 129(4):1038-1059, 2021, DOI: 10.1007/s11263-020-01400-4.
Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584-9592, 2019, DOI: 10.1109/CVPR.2019.00982.
- class anomalib.data.mvtec.MVTec(root: Path | str, category: str, image_size: int | tuple[int, int] | None = None, center_crop: int | tuple[int, int] | None = None, normalization: str | InputNormalizationMethod = InputNormalizationMethod.IMAGENET, train_batch_size: int = 32, eval_batch_size: int = 32, num_workers: int = 8, task: TaskType = TaskType.SEGMENTATION, transform_config_train: str | A.Compose | None = None, transform_config_eval: str | A.Compose | None = None, test_split_mode: TestSplitMode = TestSplitMode.FROM_DIR, test_split_ratio: float = 0.2, val_split_mode: ValSplitMode = ValSplitMode.SAME_AS_TEST, val_split_ratio: float = 0.5, seed: int | None = None)[source]¶
Bases:
AnomalibDataModule
MVTec Datamodule.
- Parameters:
root (Path | str) – Path to the root of the dataset
category (str) – Category of the MVTec dataset (e.g. “bottle” or “cable”).
image_size (int | tuple[int, int] | None, optional) – Size of the input image. Defaults to None.
center_crop (int | tuple[int, int] | None, optional) – When provided, the images will be center-cropped to the provided dimensions.
normalize (bool) – When True, the images will be normalized to the ImageNet statistics.
train_batch_size (int, optional) – Training batch size. Defaults to 32.
eval_batch_size (int, optional) – Test batch size. Defaults to 32.
num_workers (int, optional) – Number of workers. Defaults to 8.
TaskType) (task) – Task type, ‘classification’, ‘detection’ or ‘segmentation’
transform_config_train (str | A.Compose | None, optional) – Config for pre-processing during training. Defaults to None.
transform_config_val (str | A.Compose | None, optional) – Config for pre-processing during validation. Defaults to None.
test_split_mode (TestSplitMode) – Setting that determines how the testing subset is obtained.
test_split_ratio (float) – Fraction of images from the train set that will be reserved for testing.
val_split_mode (ValSplitMode) – Setting that determines how the validation subset is obtained.
val_split_ratio (float) – Fraction of train or test images that will be reserved for validation.
seed (int | None, optional) – Seed which may be set to a fixed value for reproducibility.
- prepare_data_per_node¶
If True, each LOCAL_RANK=0 will call prepare data. Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data.
- allow_zero_length_dataloader_with_multiple_devices¶
If True, dataloader with zero length within local rank is allowed. Default value is False.
- class anomalib.data.mvtec.MVTecDataset(task: TaskType, transform: A.Compose, root: Path | str, category: str, split: str | Split | None = None)[source]¶
Bases:
AnomalibDataset
MVTec dataset class.
- Parameters:
task (TaskType) – Task type,
classification
,detection
orsegmentation
transform (A.Compose) – Albumentations Compose object describing the transforms that are applied to the inputs.
split (str | Split | None) – Split of the dataset, usually Split.TRAIN or Split.TEST
root (Path | str) – Path to the root of the dataset
category (str) – Sub-category of the dataset, e.g. ‘bottle’
- anomalib.data.mvtec.make_mvtec_dataset(root: str | Path, split: str | Split | None = None, extensions: Sequence[str] | None = None) DataFrame [source]¶
Create MVTec AD samples by parsing the MVTec AD data file structure.
- The files are expected to follow the structure:
path/to/dataset/split/category/image_filename.png path/to/dataset/ground_truth/category/mask_filename.png
This function creates a dataframe to store the parsed information based on the following format: |---|—————|-------|———|---------------|—————————————|-------------| | | path | split | label | image_path | mask_path | label_index | |---|—————|-------|———|---------------|—————————————|-------------| | 0 | datasets/name | test | defect | filename.png | ground_truth/defect/filename_mask.png | 1 | |---|—————|-------|———|---------------|—————————————|-------------|
- Parameters:
path (Path) – Path to dataset
split (str | Split | None, optional) – Dataset split (ie., either train or test). Defaults to None.
split_ratio (float, optional) – Ratio to split normal training images and add to the test set in case test set doesn’t contain any normal images. Defaults to 0.1.
seed (int, optional) – Random seed to ensure reproducibility when splitting. Defaults to 0.
create_validation_set (bool, optional) – Boolean to create a validation set from the test set. MVTec AD dataset does not contain a validation set. Those wanting to create a validation set could set this flag to
True
.
Examples
The following example shows how to get training samples from MVTec AD bottle category:
>>> root = Path('./MVTec') >>> category = 'bottle' >>> path = root / category >>> path PosixPath('MVTec/bottle')
>>> samples = make_mvtec_dataset(path, split='train', split_ratio=0.1, seed=0) >>> samples.head() path split label image_path mask_path label_index 0 MVTec/bottle train good MVTec/bottle/train/good/105.png MVTec/bottle/ground_truth/good/105_mask.png 0 1 MVTec/bottle train good MVTec/bottle/train/good/017.png MVTec/bottle/ground_truth/good/017_mask.png 0 2 MVTec/bottle train good MVTec/bottle/train/good/137.png MVTec/bottle/ground_truth/good/137_mask.png 0 3 MVTec/bottle train good MVTec/bottle/train/good/152.png MVTec/bottle/ground_truth/good/152_mask.png 0 4 MVTec/bottle train good MVTec/bottle/train/good/109.png MVTec/bottle/ground_truth/good/109_mask.png 0
- Returns:
an output dataframe containing the samples of the dataset.
- Return type:
DataFrame