Building an imagefolder dataset takes too long

There are about 20,000+ images and text information in the local folder, and it took about 30 minutes to build an imagefolder dataset. The build process appears to be traversing the folders doing a series of confirmations. If there are billions of data, how to deal with it?
code:

dataset = load_dataset('imagefolder',
                        data_dir='/home/data/ms_coco/val2017/',
                        streaming=True,
                        ignore_verifications=True,
                        cache_dir='/home/data/ms_coco/huggingface/valid'
            )

Please help me :grinning:.

Hi! Can you interrupt (CTRL + C or CMD + C) the process while waiting for it to finish and paste the returned error stack trace here to help us debug the issue? Also, what’s the output of the datasets-cli env command?