How to load large-scale text-image pair dataset

ImageFolder’s file resolution is currently not optimized for large datasets like this one. In your case, it’s best to create a dataset loading script or use Dataset.from_generator (with a generator that yields {"image": pil_image, "text": text} dictionaries) instead of load_dataset to generate the dataset.

2 Likes