Data augmentation for image (ViT) using Hugging Face

Unknown-User · September 6, 2021, 10:05pm

Hi everyone,

I am currently doing the training of a ViT on a local dataset of mine. I have used the dataset template of hugging face to create my own dataset class.

To train my model I use pytorch functions (Trainer etc…), and I would like to do some data augmentation on my images.

Does hugging face allow data augmentation for images ? Otherwise, guessing I should use pytorch for the data augmentation, how could I proceed ?

Thank you

nielsr · September 7, 2021, 7:11am

Hi,

the feature extractors (like ViTFeatureExtractor) are fairly minimal, and typically only support resizing of images and normalizing the channels. For all kinds of image augmentations, you can use torchvision’s transforms or albumentations for example.

Unknown-User · September 7, 2021, 1:00pm

Hi, thanks for the reply.

Being more specific, what is the best way to implement a data augmentation on-the-fly during the training using torchvision ? (since my dataset is already very large I can’t create the augmented dataset and then load it, it would take way too much time and memory)

Is it feasible in the function generate_examples of the dataset class ?

If not, where would you advise me to do it ?

If you can put some bricks of code (just to have the idea of the implementation you have in mind) I’d be more than happy.

Thanks

nielsr · September 9, 2021, 9:57am

Hi,

You’re in luck, cause we’ve recently added an image classification script to the examples folder of the Transformers library. It illustrates how to use Torchvision’s transforms (such as CenterCrop, RandomResizedCrop) on the fly in combination with HuggingFace Datasets, using the .set_transform() method.

Unknown-User · September 13, 2021, 8:16am

Amazing !

Thanks a lot

Just a question : since I am using pytorch lightning for the training, if I apply the transforms.Compose operation in the preprocess_images (the function doing basically a moveaxis and applying feature_extractor as you defined here : Transformers-Tutorials/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub), will these transformations be made on the fly during the training as you do in your example (seeing each epoch a different version of the same image) or does it create a fixed version of the dataset with data augmentation only performed at this time ?

Because I see that in your example above you use a Hugging Face Trainer, so maybe it handles data augmentation differently than the pytorch lightning trainer, in order to make it on the fly.

Thanks

Unknown-User · September 22, 2021, 10:01am

Hey,

I allow myself to up this issue, I couldn’t solve it for the moment.

DavidJimenez · October 25, 2021, 4:05pm

Any update on this? I’ve been trying to use set_transform with the trainer provided by pytorch lightning to apply transformation on-the-fly, but the behavior is not as expected. In the transform function I only receive an item instead of a batch and the data types are not the ones defined in set_format().
Any help would be much appreciated.

nielsr · October 25, 2021, 5:16pm

Pinging @lhoestq here for what’s the best way to perform image augmentations on-the-fly with Datasets.

lhoestq · October 27, 2021, 2:03pm

According to the set_transform documentation:

A formatting function is a callable that takes a batch (as a dict) as input and returns a batch.

so it will always be one dictionary that is passed you your transform, but the values in the dictionary are lists of size batch_size. Can you check if this is the case ?

If this is the case then you are good and you can process the examples by batch

But if you have only lists of 1 element, then the issue might come from the data loader.

Indeed, by default the pytorch data loader load batches of data from a dataset one by one like this:

batch = [dataset[idx] for idx in range(start, end)]

Therefore the augmentation function passed to set_transform is called batch_size times with one element. For the function to get more than one item per execution, it should be used like this instead:

batch = dataset[start:end]
# or
batch = dataset[list_of_indices]

I think you can change the pytorch data loading behavior to work this way if you use the BatchSampler

Let me know if that helps !

DavidJimenez · December 10, 2021, 10:49am

Hi @lhoestq . We have retaken this and, yes, your answer helped a lot! Many thanks for the support!

Topic		Replies	Views
Hugging face datasets and applying transformations Beginners	0	303	February 21, 2024
Image data augmentation - ViT Beginners	1	1206	July 28, 2022
Datasets - how to add augmentations? 🤗Datasets	1	601	October 25, 2023
How to ensure GPU utilisation when preprocessing huggingface datasets Beginners	1	736	April 27, 2024
Fine-tuning image classification with data augmentation using Trainer Beginners	0	1102	April 21, 2023

Data augmentation for image (ViT) using Hugging Face

Related topics