Itâs surprising that such a prominent dataset is so complicated to load. Note that even this solution leaves the class names as n000XXX rather than the correlated text in words.txt but that is not critical to my current task.
This dataset is tricky to load because it doesnât follow the standard image folder structure. And you can use map
similar to the map
calls from my snippet to replace the class names with the correlated words.
and it threw âValueError: operands could not be broadcast together with shapes (224,224) (3,)â on the first line.
It would be easier to debug this error from the actual code, but your notebook is not public, so Iâd assume that some of the images are grayscale. Replacing the line:
inputs = feature_extractor([x for x in example_batch[âimageâ]], return_tensors=âptâ)
with
inputs = feature_extractor([x.convert("RGB") for x in example_batch[âimageâ]], return_tensors=âptâ)
should fix the issue.