I have a datest for semantic segmentation with the following structure:
Dataset
βββ image
β βββ *.jpg
βββ annotation
β βββ *.png
And using the load_dataset with the imagefolder as here:
dataset = load_dataset("imagefolder", data_dir=os.path.join(path,'repositories/data/dataset/'), split='train')
dataset[0]
it does the following:
{'image': <PIL.PngImagePlugin.PngImageFile image mode=I;16 size=1098x566>,
'label': 0} # it can't detect the PNG files but the jpg files it does
I tried changing the names of the folders to label, labels, and annotation but it didnβt work. On how to do the splits it is clear in the documentation, but this division with labels I didnβt understand.
I solved the problem by doing it this way, looking for it in these forums:
import datasets
import glob
IMAGES = glob.glob(os.path.join(path,'repositories/data/dataset/image/*.jpg'))
SEG_MAPS = glob.glob(os.path.join(path,'repositories/data/dataset/annotation/*.png'))
dataset = datasets.Dataset.from_dict({"image": IMAGES, "annotation": SEG_MAPS},
features=datasets.Features(
{"image": datasets.Image(),
"annotation": datasets.Image()})
)
My question is: where did I made a mistake or I missed a documentation ? I would like to just give just the image folder and the data_dir