Hello,
I am building a new metadata file to my current ImageFolder data, using
dataset = load_dataset(
'imagefolder',
data_dir=dataset_name_or_path,
split="train",
drop_labels=False,
drop_metadata=False,
keep_in_memory=False,
verification_mode="no_checks"
)
Until now I have used no annotation file yet, so the label index was loaded based on subfolder struct
I have tried to add annotations.csv
and to add there
file_id,label
relative_or_full_path,label_index
The label_index is int
Things that was tested
- I have called the file metadata.csv and added data_files={ātrainā:āmetadata.csvā}
- calling the dataset as usual, and using the asusmption that the imagefolder build script will read that metadata.csv
- calling as usual (worked well)
- changing the directory structure into positive and negative folders to make the labels 0/1 in the output dataset
What is the scheme of the metadata.csv?
Important to mention that I donāt want to load that as a CSV dataset, for the ImageFolder build script to read the images and load them as np.arrays / PIL images and not as paths. I just want the dataset annotations would be stored there.
If for some technical reason json annotations are better - iāll switch to those
Thanks!