Metadata CSV annotations for ImageFolder dataset

Hello,

I am building a new metadata file to my current ImageFolder data, using

dataset = load_dataset(
        'imagefolder',
        data_dir=dataset_name_or_path,
        split="train",
        drop_labels=False,
        drop_metadata=False,
        keep_in_memory=False,
        verification_mode="no_checks"
    )

Until now I have used no annotation file yet, so the label index was loaded based on subfolder struct

I have tried to add annotations.csv
and to add there

file_id,label
relative_or_full_path,label_index

The label_index is int

Things that was tested

  1. I have called the file metadata.csv and added data_files={ā€œtrainā€:ā€œmetadata.csvā€}
  2. calling the dataset as usual, and using the asusmption that the imagefolder build script will read that metadata.csv
  3. calling as usual (worked well)
  4. changing the directory structure into positive and negative folders to make the labels 0/1 in the output dataset

What is the scheme of the metadata.csv?
Important to mention that I donā€™t want to load that as a CSV dataset, for the ImageFolder build script to read the images and load them as np.arrays / PIL images and not as paths. I just want the dataset annotations would be stored there.

If for some technical reason json annotations are better - iā€™ll switch to those

Thanks!

Hi ! You can find the documentation here: Image Dataset

Basically you a metadata.csv to with a file_name column

Thanks!
Solved it! in addition to the file_name, in some folder the filename was not metadata.csv

after I have changed this all the categories are being read correctly (I had to include all of them in the training folder in order they would be considered in the training set)