Hello,
I have a folder with 6.000 images, and a metadata.csv file that contains two columns: file_name
and label
.
I am trying to create a dataset using this command:
from datasets import load_dataset
dataset = load_dataset("imagefolder", data_dir="data/food_imgs", split='train')
But, as a result, I only get one row:
Dataset({
features: ['image', 'label'],
num_rows: 1
})
How can I fix this? I should have 6.000 rows in the dataset.
I tried to execute your code with an image folder in my computer. In the folder, I have two subfolders namely ants and bees. There are 245 images in two categories. After running your script, I get the output below. It seems it is working as expected.
Dataset({
features: [‘image’, ‘label’],
num_rows: 245
})
When I include two more lines to the code as below, I can get the following output showing that there are two labels attached to the images.
print(dataset[0])
print(dataset[150])
{‘image’: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=768x512 at 0x1F0AC70A3D0>, ‘label’: 0}
{‘image’: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x375 at 0x1F0AC70A1C0>, ‘label’: 1}
What version of datasets
do you have?
It seems the version of datasets is 2.9.0. I installed it using PyCharm package manager.
I finally found the issue. One of the images contained the word “training” in its file name, and it seems that, in this case, the load_dataset
function assumes that I only want to upload images with “training” in the name, as if it was the split.
That is why I was only getting one image as a return. This should be better documented