We just released datasets
2.16.1 which optimizes the data files resolutions and makes it possible to load datasets with millions of images. It also requires huggingface-hub
>= 0.20.1
Older versions of datasets
and huggingface-hub
are slow to handle that many files