Documentation script for fine-tuning Mask2Former with Trainer does not support instance segmentation with superposed instances

Thibaut · February 13, 2025, 12:37pm

I have a use case of Mask2Former where I must superpose image annotations.

One example is I have to detect “cracks” and “reparation”… As well as “cracks” within “reparations”. So I will have “cracks” instances superposed to “reparation” instances.

This is part of the Instance Segmentation paradigm and models Mask2Former support it natively.

The documentation suggests storing the mask in a single image :

The Red channel is used to store the Label ID
The Green channel is used to store the Instance ID

Ref : https://github.com/huggingface/transformers/tree/main/examples/pytorch/instance-segmentation

This does not allow the superposition of instances, as one pixel can only have one class and one instance.

Then this is decoded by AutoImageProcessor into mask_labels and class_labels. Where mask_labels is (if no mistake) an NxWxH binary tensor with N being the number of instances and class_labels is an N, integers, tensor, storing the Label ID of each instance.

I can see the workaround. In my Hugging Face dataset, I shall create:

A column for the image
A column for the list of instances masks (i.e. a NxWxH tensor)
A column for the list of Labels ID of each Instance.

Like many other users, I can only export instance segmentation annotation from my annotation software in standard formats such as COCO.

This means that my process is now…

Export my annotations in COCO or a similar format
Remix the Coco’s .json structure in HuggingFace .jsonl structure
Store that in the Hugging Face Dataset
Load the dataset in the training script
Decode the masks’ RLE into NxWxH and N tensors
Apply my transformations on the tensors
Feed the “Training” class

I’m a bit surprised because the initial implementation of Mask2Former in Detectron2 natively supports loading COCO datasets.

Has this been deleted in the Hugging Face implementation or am I missing something?

John6666 · February 13, 2025, 2:15pm

Hmm… it seems that your understanding is correct.

github.com/NielsRogge/Transformers-Tutorials

How to train Mask2Former from a COCO json custom dataset?

opened 10:05PM - 20 May 23 UTC

Robotatron

I have my custom dataset as a json in COCO format. The tutorials for MaskForm…er and Mask2Former for Huggingface Vision work with a different custom dataset format (a weird RGB encoding with custom meaning of each channel). Is there a simple Dataset implementation that takes the coco.json file and outputs the data in a format Mask2Former from Huggingface can work with? If not, is there a tutorial to how to convert a COCO JSON custom dataset to a dataset format needed for Huggingface?

Thibaut · February 28, 2025, 4:56pm

I have adapted the dataset and the run_instance_segmentation.py script but it is not straightforward…

In the dataset I store the data like this :

Image
List of masks RLE string
List of the corresponding class for each mask

Then I modified how the data is loaded in the augment_and_transform_batch function so that I decode the RLE and create a stack of masks.

There, I did not find how to feed AutoImageProcessor with a stack of masks so I removed it and made my own.

While this seems simple, there are lots of traps.

If someone has a better solution…

Thibaut · March 2, 2025, 3:19pm

The big con of this solution is that I actually create a non-standard Hugging Face model, by removing the image processor. As I want to use Hugging Face integration to CVAT afterwards, it means I also need to adapt the integration itself…

Topic		Replies	Views
Custom dataset for MaskFormer and Mask2Former Beginners	3	550	October 17, 2024
Custom dataset for Mask2Former finetuning 🤗Datasets	2	2314	November 23, 2023
Dataset for Mask2former 🤗Datasets	1	175	October 9, 2024
Can the image processor for instance segmentation be adapted to work with stacks of masks? 🤗Transformers	0	18	March 2, 2025
Image segmentation of a kaggle dataset 🤗Datasets	2	1415	April 12, 2023

Documentation script for fine-tuning Mask2Former with Trainer does not support instance segmentation with superposed instances

Related topics