Metadata CSV annotations for ImageFolder dataset

tankwell · March 13, 2024, 7:39am

Hello,

I am building a new metadata file to my current ImageFolder data, using

dataset = load_dataset(
        'imagefolder',
        data_dir=dataset_name_or_path,
        split="train",
        drop_labels=False,
        drop_metadata=False,
        keep_in_memory=False,
        verification_mode="no_checks"
    )

Until now I have used no annotation file yet, so the label index was loaded based on subfolder struct

I have tried to add annotations.csv
and to add there

file_id,label
relative_or_full_path,label_index

The label_index is int

Things that was tested

I have called the file metadata.csv and added data_files={“train”:“metadata.csv”}
calling the dataset as usual, and using the asusmption that the imagefolder build script will read that metadata.csv
calling as usual (worked well)
changing the directory structure into positive and negative folders to make the labels 0/1 in the output dataset

What is the scheme of the metadata.csv?
Important to mention that I don’t want to load that as a CSV dataset, for the ImageFolder build script to read the images and load them as np.arrays / PIL images and not as paths. I just want the dataset annotations would be stored there.

If for some technical reason json annotations are better - i’ll switch to those

Thanks!

lhoestq · March 18, 2024, 11:09am

Hi ! You can find the documentation here: Image Dataset

Basically you a metadata.csv to with a file_name column

tankwell · March 19, 2024, 4:52pm

Thanks!
Solved it! in addition to the file_name, in some folder the filename was not metadata.csv

after I have changed this all the categories are being read correctly (I had to include all of them in the training folder in order they would be considered in the training set)

Topic		Replies	Views
Create Dataset with metadata 🤗Datasets	1	1363	November 28, 2022
How to use 'Imagefolder' but with different jsonl file? 🤗Datasets	1	524	July 27, 2023
ImageFolder dataset builder for HF Hub dataset 🤗Datasets	5	278	February 26, 2024
Load_dataset with labels 🤗Datasets	0	260	April 16, 2024
ImageFolder dataloading guidance Beginners	2	529	December 16, 2022

Metadata CSV annotations for ImageFolder dataset

Related topics