There are about 20,000+ images and text information in the local folder, and it took about 30 minutes to build an imagefolder dataset. The build process appears to be traversing the folders doing a series of confirmations. If there are billions of data, how to deal with it?
code:
dataset = load_dataset('imagefolder',
data_dir='/home/data/ms_coco/val2017/',
streaming=True,
ignore_verifications=True,
cache_dir='/home/data/ms_coco/huggingface/valid'
)
Please help me .