How to load image dataset using csv to get proper dataset datatype

siddharth963 · September 27, 2022, 2:00pm

Hi, I’m working on image classification. I have csv file which has image names, and their label mappings.
I tried as suggested here., but I want the image to be in the form of PIL format, and also labels has to be detected.

this is what I’m getting:

>> data_files = {'train': 'train.csv', 'test': 'test.csv'}
ds = load_dataset('csv', data_files=data_files, data_dir='/kaggle/input/cassava-leaf-disease-classification/train_images/')
>> ds
DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'image_id', 'label'],
        num_rows: 17118
    })
    test: Dataset({
        features: ['Unnamed: 0', 'image_id', 'label'],
        num_rows: 4279
    })
})

>>ds['train'].features
{'Unnamed: 0': Value(dtype='int64', id=None),
 'image_id': Value(dtype='string', id=None),
 'label': Value(dtype='int64', id=None)}

This is how I want it to be:

(this image is from a colab notebook provided by huggingface)

How can I get it right?

NimaBoscarino · September 27, 2022, 6:42pm

That notebook uses the ImageFolder loading strategy, but since you’re using a CSV file you can just cast the image_id column to an Image() after you’ve loaded the dataset. i.e. you can just run

from datasets import Image
ds = ds.cast_column("image_id", Image())

(From Load image data)

And when you check the features again you’ll see that image_id will be an image!

siddharth963 · September 28, 2022, 11:46am

Thanks for the reply.
By following that, I get this error -

>> ds = ds.cast_column("image_id", Image())
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-22-8d1f43006898> in <module>
----> 1 ds = ds.cast_column("image_id", Image())

19 frames
/usr/local/lib/python3.7/dist-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from int64 to struct using function cast_struct

Topic		Replies	Views
Turn of automatic Pil image generation in load_dataset 🤗Datasets	2	39	August 21, 2024
How to extract Images from Arrow datasets Beginners	3	236	December 27, 2024
Handle errors when loading images (404, corrupted, etc) 🤗Datasets	4	825	August 17, 2023
How to change the format of a dataset 🤗Datasets	3	1028	November 3, 2022
Undesired behavior when using load_dataset 🤗Datasets	4	947	April 17, 2023

How to load image dataset using csv to get proper dataset datatype

Related topics