I am also trying to create an image segmentation dataset and I am struggling with how to do that.
The dataset I want to upload to HF is the aisegmentcn-matting-human datasets, which you also can find at Kaggle.
I also have the dataset in my machine in a folder named data:
βββ data/
βββ clip_img/
β βββ {group-id}/
β βββ clip_{subgroup-id}/
β βββ {group-id}-{img-id}.jpg
βββ matting/
βββ {group-id}/
βββ matting_{subgroup-id}/
βββ {group-id}-{img-id}.png
all matting images are of the kind {group-id}-{img-id}.png are exactly equal to the input image {group-id}-{img-id}.jpg but with the background removed. So, if labels are background
and foreground
, if the color is 0 it is background, anything else is foreground. Just 2 classes of labels.
The primary reason I want this dataset in HF is to use it in a HF space and notebook.
It seems I have to process the raw data into an expected format. If I just try to:
>>> from datasets import load_from_disk
>>> ds = load_from_disk('./data')
...
FileNotFoundError: Directory data is neither a dataset directory nor a dataset dict directory.
So, I guess I have to create a DatasetDict with image
and label
columns, right?
Is there a How To guide for this kind of image dataset?