Hugging face datasets and applying transformations

RESOLVER101757 · February 21, 2024, 12:48pm

im a little confused about where to apply transformations

I have a hugging face dataset of images and pass that to a pytorch dataloader. I’m already applying transformations in hugging face dataset to change the image size and normalizing the data and this overwrites the existing dataset values, this is working fine for me. However, I want to add some more transformations but rather than replacing them, i want to create additions to change the images slightly so i have slight variations for things like whether its night time or a brighter day. Is the best place to do this on the hugging face dataset (on the fly) or pytorches data loaders (on the fly) ? some things are unclear to me :

if create a extra copy of the transformations so i have the orignal and the transformed copies of the images in the hugging face dataset. When looping through the data loader which say is set to 100 batches , would i actually get returned a batch 100 or 200?
in what circumstances would you apply transformations at the hugging face dataset level
in what circumstances would you apply transformations at the pytorch data loaders dataset level

Topic		Replies	Views
How to ensure GPU utilisation when preprocessing huggingface datasets Beginners	1	738	April 27, 2024
Data augmentation for image (ViT) using Hugging Face Beginners	9	6036	December 10, 2021
Image dataset with_transform not applied Beginners	1	112	July 25, 2024
Using Hugging Face dataset class as pytorch class Beginners	3	593	September 29, 2021
Convert dataset to pytorch dataloader 🤗Datasets	3	7164	April 7, 2023

Hugging face datasets and applying transformations

Related topics