How do I create a Image Segmentation Dataset

Hi Guys, I have dataset that has the base images, their segmentation masks, and also the labels, how do I create an HF dataset from this, so that I can use segmentation transformers. Please help.


Welcome, @Archan! So the SegFormer model in Transformers is going to expect the features to be like the ones found in this dataset: segments/sidewalk-semantic · Datasets at Hugging Face Namely, you need pixel_values, which is just the image, and label, which is the segmentation map, which is one label per pixel. This blog post might be helpful: Fine-Tune a Semantic Segmentation Model with a Custom Dataset

If you have N masks per image, for N labels) then I think you have to merge it into one segmentation map first. The blog post shows how you can do that with a masks made with, but if you created the masks some other way you can write a similar script to do the same thing.

Hope this helps!

1 Like

Hey @NimaBoscarino, I currently have 2 sets of images, one the original and another masked and then a CSV file having the label of the mask. There are only one mask per image. I have gone through the blog, it has used which I don’t have access to nor I can as the data is private. Also I don’t need to separately mask it, I already have the masking I just want to convert it to a HF dataset format that’s it.

Hey @NimaBoscarino, I was thinking about using this builder code:

import datasets
import glob

IMAGES = glob.glob("path/to/archive/dataset/semantic_drone_dataset/original_images/*.jpg")
SEG_MAPS = glob.glob("path/to/archive/dataset/semantic_drone_dataset/label_images_semantic/*.png") 
dataset = datasets.Dataset.from_dict({"image": IMAGES, "label": SEG_MAPS}, features=datasets.Features({"image": datasets.Image(), "label": datasets.Image()}))

Yeah I think that builder code is in the right direction! Looks like this documentation gives you the exact code that you need for it: transformers/examples/pytorch/semantic-segmentation at main · huggingface/transformers · GitHub

You can also give ImageFolder with Metadata a shot :slight_smile: if you want to host this dataset on the Hub. This is also nice if you had other features you wanted to include.

1 Like

@nateraw Thank you will sure take a look.

@NimaBoscarino and @nateraw a very naive question, so we have processed the images and now every image represent one single mask, like
|- Img 1
|- Img 2

|- Mask for Img 1
|- Mask for Img 2

and these mask are for single classes, so now I have to create a json file which will have the number of image and then their respective mask. This part I am not understanding at all, as how will the model understand which mask is for which class.

Do you have more or less than 255 classes?

7 classes and total 4K points each nearly 550 times

You can construct a greyscale image by combining the masks, then use that image as a new feature called “label” using the datasets.Image feature type :slight_smile:. If you save this new mask as a file and use Imagefolder metadata, referring to the mask path, it should cast to datasets.Image for ya out of the box.

Now to do that…here’s a vvv quick explanation

Let’s say you have an image that’s shape is (1080, 720, 3). You could construct a mask of shape (1080, 720) where each value therein is [0, num_classes) denoting which object is in that pixel location.

So if you had two masks:


0 1 0
0 0 0
0 1 1

and Dog

1 0 1
0 0 1
1 0 0

Let’s also assume you want a background class to make it clearer…you’d construct labels = ['background', 'cat', 'dog'] and make following array you could save as greyscale image

2 1 2
0 0 2
2 1 1

You can see a dataset like that in the segments AI dataset Nima mentioned above.

@nateraw @NimaBoscarino Hey thank you for the help, I was able to mask the dataset and construct a datasetdict from it. But now there is another small issue.

I am getting this error, and not sure why this is coming. I am trying with the base code only.

This is the function where it is throwing the error.

This could have to do with the fact that the images aren’t converted to RGB before passing them to the feature extractor. Can you do inputs = feature_extractor(["image.convert("RGB") for image in images], labels)?

I was able to solve that part by doing this:

feature_extractor = SegformerFeatureExtractor(do_resize=False, do_normalize=False)

But I am getting another error
Not sure why this is coming

Even after asserting the Image class same error is being thrown


@nielsr I even tried with the segments dataset that you had shown in the demo, I got the same error

RuntimeError: Could not infer dtype of Image

As always, become one with the data. Make sure the data is prepared in the right way.

You can do the following things:

  • print examples of your dataset
  • create a PyTorch DataLoader and check whether the data is created appropriately

So when I tried with your dataset I did the same, checked if was missing any steps, but IDK why I got the same error.

in case of the segments dataset:
pixel_values looks like this : <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1080 at 0x7FADD7C17F90>

label looks like: <PIL.PngImagePlugin.PngImageFile image mode=L size=1920x1080 at 0x7F54823057D0>

and my dataset looks like

pixel_values: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=305x305 at 0x7FADEE32EF90>

label : <PIL.PngImagePlugin.PngImageFile image mode=L size=305x305 at 0x7FADEE0E9B90>

I am genuinely lost at what is going wrong, I have tried both declaring as a dataset dict as well as using a datasetloader

the dataset looks same hence I am getting the same error

I am also trying to create an image segmentation dataset and I am struggling with how to do that.

The dataset I want to upload to HF is the aisegmentcn-matting-human datasets, which you also can find at Kaggle.

I also have the dataset in my machine in a folder named data:

└── data/
    ├── clip_img/
    │   └── {group-id}/
    │       └── clip_{subgroup-id}/
    │           └── {group-id}-{img-id}.jpg
    └── matting/
        └── {group-id}/
            └── matting_{subgroup-id}/
                └── {group-id}-{img-id}.png

all matting images are of the kind {group-id}-{img-id}.png are exactly equal to the input image {group-id}-{img-id}.jpg but with the background removed. So, if labels are background and foreground, if the color is 0 it is background, anything else is foreground. Just 2 classes of labels.

The primary reason I want this dataset in HF is to use it in a HF space and notebook.
It seems I have to process the raw data into an expected format. If I just try to:

>>> from datasets import load_from_disk
>>> ds = load_from_disk('./data')
FileNotFoundError: Directory data is neither a dataset directory nor a dataset dict directory.

So, I guess I have to create a DatasetDict with image and label columns, right?
Is there a How To guide for this kind of image dataset?