Hugging Face Forums

Using External Datasets with HuggingFace Data Loader

mariosasko April 26, 2022, 1:20pm 8

It’s surprising that such a prominent dataset is so complicated to load. Note that even this solution leaves the class names as n000XXX rather than the correlated text in words.txt but that is not critical to my current task.

This dataset is tricky to load because it doesn’t follow the standard image folder structure. And you can use map similar to the map calls from my snippet to replace the class names with the correlated words.

and it threw “ValueError: operands could not be broadcast together with shapes (224,224) (3,)” on the first line.

It would be easier to debug this error from the actual code, but your notebook is not public, so I’d assume that some of the images are grayscale. Replacing the line:

inputs = feature_extractor([x for x in example_batch[‘image’]], return_tensors=‘pt’)

with

inputs = feature_extractor([x.convert("RGB") for x in example_batch[‘image’]], return_tensors=‘pt’)

should fix the issue.

1 Like

Topic		Replies	Views	Activity
Can’t generate my own dataset using load_dataset Beginners	1	171	May 7, 2024
Loading a dataset cached in a LocalFileSystem is not supported 🤗Datasets	2	220	June 8, 2025
Data augmentation for image (ViT) using Hugging Face Beginners	9	5981	December 10, 2021
Problem accessing dataset Beginners	5	16467	January 11, 2023
Couldn't find 'my_dataset' on the Hugging Face Hub 🤗Datasets	4	3224	May 2, 2023