Custom dataset for Mask2Former finetuning


I have started creating a synthetic dataset consisting of the original images and their segmentation masks (one black/white mask as a separate png image for each instance in the original image). My aim is to use my dataset for finetuning on Mask2Former.

As far as I understood according to this link, I need to merge all masks into one segmentation map for each original image. Is that right? If so, this shouldn’t be a problem.

For this step I don’t want to use as suggested in this blog, because I created my dataset with Blender and it’s quite simple to create segmentation maps from there with a python script.

So now I have original images and one segmentation map for each image.

What are the next steps for me? I somehow have to link the RGB colors of the masks to class names (in a JSON file? which format?), but how exactly does this need to look like?
And what else do I need to do to create a hugging face dataset thats suited for finetuning Mask2Former?

As I am quite new to the world of AI/datasets/huggingface I might need you to explain it in small steps, even if the problem seems easy for you to solve :blush:. Any help is appreciated. Thanks a lot!

Convert all the masks into single channel gray scale images. Add up the masks for each image such that you end up with a single mask for each image. Each pixel in the combined mask would represent a single number and that number would correspond to an unique class name.

I think you have already done this based on your post.

I am assuming you have fewer than 256 labels, so all your classes can be described by a single channel mask image.

You can then follow this recipe to create the dataset.

Also create the id2label file which is just the mapping of your grayscale value in the mask to the class.

Hi panigrah,

thank you very much. That helped a lot. I tried to implement it today and I think I got it now.

1 Like