Image data augmentation - ViT

Hey everyone, I’ve been scouring the internet the last few days trying to find an answer to the following question, but it’s still a bit unclear to me. When applying augmenting data using map or set_transform, I’ve noticed that the size of the training set does not increase, so I got confused as to what’s actually happening. In my mind, if it’s not adding additional data, then it doesn’t make sense. I think I may have an understanding now, though and I would appreciate if someone can confirm or correct my current understanding which is as follows

  1. Data is loaded into a Dataset and split into train, dev and test sets → DatasetDict.
  2. The set_transform method is applied to the dataset with whichever function has been passed as a parameter.
  3. Training data remains unchanged until model is trained.
  4. At each epoch, the transformations are applied to the input data, so the amount of training data stays constant, but variation is added through the transformations.
  5. Although variation is added, the addition to the training data would come from more epochs + constant several different transformations = better inference.

So in essence, the training data doesn’t actually get “augmented” in the sense that it becomes more, but instead there is a multiplier effect because of the transformations at each epoch, provided the number of epochs increases until performance peaks.

Is this the correct understanding of how data augmentation works for a ViT model using DatasetDict and set_transform?

Thank you!

2 Likes

Hi ! Yes this is 100% correct ^^

The randomness of data augmentation function passed to set_transform returns a different image if you access the same example twice. It is especially useful when training a model for several epochs. This is a way to artificially augment the size or your dataset :wink: