Data augmentation for image (ViT) using Hugging Face

Hi everyone,

I am currently doing the training of a ViT on a local dataset of mine. I have used the dataset template of hugging face to create my own dataset class.

To train my model I use pytorch functions (Trainer etc…), and I would like to do some data augmentation on my images.

Does hugging face allow data augmentation for images ? Otherwise, guessing I should use pytorch for the data augmentation, how could I proceed ?

Thank you


the feature extractors (like ViTFeatureExtractor) are fairly minimal, and typically only support resizing of images and normalizing the channels. For all kinds of image augmentations, you can use torchvision’s transforms or albumentations for example.

1 Like

Hi, thanks for the reply.

Being more specific, what is the best way to implement a data augmentation on-the-fly during the training using torchvision ? (since my dataset is already very large I can’t create the augmented dataset and then load it, it would take way too much time and memory)

Is it feasible in the function generate_examples of the dataset class ?

If not, where would you advise me to do it ?

If you can put some bricks of code (just to have the idea of the implementation you have in mind) I’d be more than happy.



You’re in luck, cause we’ve recently added an image classification script to the examples folder of the Transformers library. It illustrates how to use Torchvision’s transforms (such as CenterCrop, RandomResizedCrop) on the fly in combination with HuggingFace Datasets, using the .set_transform() method.

Amazing !

Thanks a lot

Just a question : since I am using pytorch lightning for the training, if I apply the transforms.Compose operation in the preprocess_images (the function doing basically a moveaxis and applying feature_extractor as you defined here : Transformers-Tutorials/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub), will these transformations be made on the fly during the training as you do in your example (seeing each epoch a different version of the same image) or does it create a fixed version of the dataset with data augmentation only performed at this time ?

Because I see that in your example above you use a Hugging Face Trainer, so maybe it handles data augmentation differently than the pytorch lightning trainer, in order to make it on the fly.



I allow myself to up this issue, I couldn’t solve it for the moment.

1 Like

Any update on this? I’ve been trying to use set_transform with the trainer provided by pytorch lightning to apply transformation on-the-fly, but the behavior is not as expected. In the transform function I only receive an item instead of a batch and the data types are not the ones defined in set_format().
Any help would be much appreciated.

Pinging @lhoestq here for what’s the best way to perform image augmentations on-the-fly with Datasets.

According to the set_transform documentation:

A formatting function is a callable that takes a batch (as a dict) as input and returns a batch.

so it will always be one dictionary that is passed you your transform, but the values in the dictionary are lists of size batch_size. Can you check if this is the case ?

If this is the case then you are good and you can process the examples by batch :slight_smile:

But if you have only lists of 1 element, then the issue might come from the data loader.

Indeed, by default the pytorch data loader load batches of data from a dataset one by one like this:

batch = [dataset[idx] for idx in range(start, end)]

Therefore the augmentation function passed to set_transform is called batch_size times with one element. For the function to get more than one item per execution, it should be used like this instead:

batch = dataset[start:end]
# or
batch = dataset[list_of_indices]

I think you can change the pytorch data loading behavior to work this way if you use the BatchSampler

Let me know if that helps !

1 Like

Hi @lhoestq . We have retaken this and, yes, your answer helped a lot! Many thanks for the support!