How to structure an image dataset repo using the image folder approach?

alkzar90 · August 12, 2022, 7:45pm

Hi!

Thanks for your detailed answer and the time.

I am just wondering if there is some rule of thumb about how many images the ImageFolder approach is suitable. Currently, I curate a dataset with 1.5k images, and I noticed that load_dataset() it’s take a lot of time (~5 minutes.)

From this forum discussion about image dataset best practices, I know that the ImageFolder is highly inefficient for data streaming. Still, I don’t know if this could be the same for loading the dataset. Is it possible to tar the folder structure to speed up the data loading? If so, does it require a custom loading script?

best,
Cristóbal

Topic		Replies	Views
Proper way of preparing dataset with images 🤗Datasets	0	72	July 31, 2024
Image dataset best practices? 🤗Datasets	9	17206	January 15, 2023
Uploading image dataset to Huggingface Hub 🤗Datasets	2	2578	October 14, 2022
Undesired behavior when using load_dataset 🤗Datasets	4	945	April 17, 2023
Loading images directly in data folder 🤗Datasets	2	753	April 26, 2024

How to structure an image dataset repo using the image folder approach?

Related topics