Turn of automatic Pil image generation in load_dataset

when loading a dataset, I have the following arrow format - label: int, image: struct<bytes:binary, path:string>. when using load_dataset() method, the image is automatically converted to Pil image format, and the path is lost. is there a way to avoid that behavior?
In my example -

dataset = load_dataset("chronopt-research/cropped-vggface2-224")
for i in range(0, len(dataset['train']), batch_size):
    batch = dataset['train'][i:i + batch_size]
    images = batch['image']  # Original 224x224 images
    labels = batch['label']  # Labels for each image

the images I get are only the Pil image object, which doesn’t include the path or file name from the original arrow files.

hi @benayat
Are you looking for cast_column("image", Image(decode=False))?

Please see Load image data for example snippet.

from datasets import load_dataset, Image
dataset = load_dataset("chronopt-research/cropped-vggface2-224").cast_column("image", Image(decode=False))
1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.