Hi!
Thanks for your detailed answer and the time.
I am just wondering if there is some rule of thumb about how many images the ImageFolder
approach is suitable. Currently, I curate a dataset with 1.5k images, and I noticed that load_dataset()
it’s take a lot of time (~5 minutes.)
From this forum discussion about image dataset best practices, I know that the ImageFolder
is highly inefficient for data streaming. Still, I don’t know if this could be the same for loading the dataset. Is it possible to tar the folder structure to speed up the data loading? If so, does it require a custom loading script?
best,
Cristóbal