Turn of automatic Pil image generation in load_dataset

benayat · August 20, 2024, 9:08pm

when loading a dataset, I have the following arrow format - label: int, image: struct<bytes:binary, path:string>. when using load_dataset() method, the image is automatically converted to Pil image format, and the path is lost. is there a way to avoid that behavior?
In my example -

dataset = load_dataset("chronopt-research/cropped-vggface2-224")
for i in range(0, len(dataset['train']), batch_size):
    batch = dataset['train'][i:i + batch_size]
    images = batch['image']  # Original 224x224 images
    labels = batch['label']  # Labels for each image

the images I get are only the Pil image object, which doesn’t include the path or file name from the original arrow files.

mahmutc · August 21, 2024, 8:37am

hi @benayat
Are you looking for cast_column("image", Image(decode=False))?

Please see Load image data for example snippet.

from datasets import load_dataset, Image
dataset = load_dataset("chronopt-research/cropped-vggface2-224").cast_column("image", Image(decode=False))

system · August 21, 2024, 8:37pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to load image dataset using csv to get proper dataset datatype Beginners	2	1322	September 28, 2022
Handle errors when loading images (404, corrupted, etc) 🤗Datasets	4	813	August 17, 2023
How to extract Images from Arrow datasets Beginners	3	215	December 27, 2024
Vision Transformer Fine Tuning Issues Beginners	2	894	March 21, 2024
Undesired behavior when using load_dataset 🤗Datasets	4	945	April 17, 2023

Turn of automatic Pil image generation in load_dataset

Related topics