Can the image processor for instance segmentation be adapted to work with stacks of masks?

Thibaut · March 2, 2025, 3:39pm

Hi,

As per the doc: transformers/examples/pytorch/instance-segmentation at main · huggingface/transformers · GitHub

The Instance Segmentation Image Processor for models such as Mask2Former works as follows:

In the Dataset we store :

The image
A dual channel mask:
- Channel 1: the ID of the label
- Channel 2: the index of the instance

The image processor will convert this into:

A pile of masks of shape (N,W,H) where N is the number of instances
A list of ints of shape (N) that store the label ID of each mask

This is what models such as Mask2Former need.

Then, after inference, the inverse operation is done. The pile of individual masks is converted back to a 2 channel mask.

This is a destructive operation. Instance segmentation is supposed to accept the superposition of instances. With the 2 channels mask, this is not possible anymore as one pixel can only have one instance and one label.

My first test was to adapt the model by deleting the image processor and making my own transformations. However, I also need to use CVAT integration for Hugging Face. If I change the outputs, I also need to adapt the integration.

Do you know if the image processor for instance segmentation can accept a pile of individual instances masks instead of this 2-channel mask?

Best regards

Topic		Replies	Views
Documentation script for fine-tuning Mask2Former with Trainer does not support instance segmentation with superposed instances 🤗Transformers	3	62	March 2, 2025
How do you use segmentation image processor with more than 3 channel images? Beginners	1	299	May 13, 2024
Encoding masks for Mask2Former and Panopic Segmentation Models	2	105	October 9, 2024
Custom dataset for Mask2Former finetuning 🤗Datasets	2	2138	November 23, 2023
Limit mask size in Mask2Former results 🤗Transformers	1	29	April 1, 2025

Can the image processor for instance segmentation be adapted to work with stacks of masks?

Related topics