Image data augmentation - ViT

stephanps · July 8, 2022, 1:50pm

Hey everyone, I’ve been scouring the internet the last few days trying to find an answer to the following question, but it’s still a bit unclear to me. When applying augmenting data using map or set_transform, I’ve noticed that the size of the training set does not increase, so I got confused as to what’s actually happening. In my mind, if it’s not adding additional data, then it doesn’t make sense. I think I may have an understanding now, though and I would appreciate if someone can confirm or correct my current understanding which is as follows

Data is loaded into a Dataset and split into train, dev and test sets → DatasetDict.
The set_transform method is applied to the dataset with whichever function has been passed as a parameter.
Training data remains unchanged until model is trained.
At each epoch, the transformations are applied to the input data, so the amount of training data stays constant, but variation is added through the transformations.
Although variation is added, the addition to the training data would come from more epochs + constant several different transformations = better inference.

So in essence, the training data doesn’t actually get “augmented” in the sense that it becomes more, but instead there is a multiplier effect because of the transformations at each epoch, provided the number of epochs increases until performance peaks.

Is this the correct understanding of how data augmentation works for a ViT model using DatasetDict and set_transform?

Thank you!

lhoestq · July 28, 2022, 10:21am

Hi ! Yes this is 100% correct ^^

The randomness of data augmentation function passed to set_transform returns a different image if you access the same example twice. It is especially useful when training a model for several epochs. This is a way to artificially augment the size or your dataset

Topic		Replies	Views
Data augmentation for image (ViT) using Hugging Face Beginners	9	6020	December 10, 2021
Datasets - how to add augmentations? 🤗Datasets	1	601	October 25, 2023
Datasets not behaving as expected after random data augmentation with map 🤗Datasets	7	1253	September 23, 2021
Add data augmentation process during training every epoch Beginners	2	2865	January 20, 2021
One-to-many augmentations on the fly 🤗Datasets	6	948	April 6, 2023

Image data augmentation - ViT

Related topics