Load_dataset with labels

iltranqui · April 16, 2024, 4:16pm

I have a datest for semantic segmentation with the following structure:

Dataset
├── image
│ ├── *.jpg
├── annotation
│ ├── *.png

And using the load_dataset with the imagefolder as here:

dataset = load_dataset("imagefolder", data_dir=os.path.join(path,'repositories/data/dataset/'), split='train')
dataset[0]

it does the following:

{'image': <PIL.PngImagePlugin.PngImageFile image mode=I;16 size=1098x566>,
 'label': 0}    # it can't detect the PNG files but the jpg files it does

I tried changing the names of the folders to label, labels, and annotation but it didn’t work. On how to do the splits it is clear in the documentation, but this division with labels I didn’t understand.
I solved the problem by doing it this way, looking for it in these forums:

import datasets
import glob

IMAGES = glob.glob(os.path.join(path,'repositories/data/dataset/image/*.jpg'))
SEG_MAPS = glob.glob(os.path.join(path,'repositories/data/dataset/annotation/*.png'))

dataset = datasets.Dataset.from_dict({"image": IMAGES, "annotation": SEG_MAPS},
                                     features=datasets.Features(
                                         {"image": datasets.Image(),
                                          "annotation": datasets.Image()})
                                     )

My question is: where did I made a mistake or I missed a documentation ? I would like to just give just the image folder and the data_dir

Topic		Replies	Views
Loading an imagenet-style image dataset with train/val directories 🤗Datasets	4	1785	August 12, 2022
Pushing new dataset for images semantic segmentation 🤗Datasets	4	457	March 8, 2023
Undesired behavior when using load_dataset 🤗Datasets	4	946	April 17, 2023
Load dataset from imagefolder I get error: ValueError: Instruction "train" corresponds to no data! 🤗Datasets	2	918	July 30, 2024
Making a dataset that read the labels from parent folders Intermediate	0	536	December 2, 2021

Load_dataset with labels

Related topics