Thanks for your anwer and interesting pointers!
I am using ImageFolder
structure currently but:
- I cannot get it to work with “calibration” split name
- It’s omega slow at download since it loads files one y one (1h20 yesterday when I tried to download it all)
- It does not allow custom split strategies (like
leave_out="cat"
I mentioned)
By the way, since executing the dataset builder directly from Hub is no longer recommended,
Hmmm that’s a bummer.
it might be more convenient to publish the built data set if you want to make it public.
Could you explain what you mean by “built” please? Because when I browse other datasets, they never upload files like I did (it seems stupid to, so I expected that), they often use parquet
(I don’t think it’s very appropriate for images? Maybe zip
better?). Is that what you mean?
Or do you mean “built” as in “publish it 11 times with 11 strategies in 11 folders (entire dataset + 10 times minus one class)”?
All the best.