How to structure an image dataset repo using the image folder approach?

Hi!

Thanks for your detailed answer and the time.

I am just wondering if there is some rule of thumb about how many images the ImageFolder approach is suitable. Currently, I curate a dataset with 1.5k images, and I noticed that load_dataset() it’s take a lot of time (~5 minutes.)

From this forum discussion about image dataset best practices, I know that the ImageFolder is highly inefficient for data streaming. Still, I don’t know if this could be the same for loading the dataset. Is it possible to tar the folder structure to speed up the data loading? If so, does it require a custom loading script?

best,
Cristóbal