Encoding masks for Mask2Former and Panopic Segmentation

Hi everyone!

I’m trying to set up a fine-tuning for a panopic segmentation task. I managed to get everything working for semantic segmentations, with the mask being the category id for each polygon. It works with a single channel.

Now, I’d like to “upgrade” it to panoptic, so I need to encode the instance_id in the mask. I saw some tutorials using RGB, with the first channel being the category and the second an instance id, others using a single channel and byte operations to store both together.

I guess my general question is: what format does the model expect to see as input as label mask?

If you have any insight you could share, thank you very much! :slight_smile:

1 Like

I have no idea what the essential parts are, but it is possible that this article could tell us just the options to pass on.

In case it helps everyone, here is the solution I found:

panoptic_seg_gt = rgb_to_id(panoptic_seg_gt)
inputs = self.processor(
    [image],
    [panoptic_seg_gt],
    instance_id_to_semantic_id=inst2class,
    return_tensors="pt",
)
  • image: is a channel-first RGB picture (call image.transpose(2, 0, 1) on it)
  • panoptic_seg_gt: is a RGB picture that maps the item_id into RGB space using the rgb_to_id function. That was the main point I misunderstood: we don’t need to pass the category_id here, because that information is contained in the dictionary we pass into instance_id_to_semantic_id and the processor automatically maps item_ids to category_ids. Yes, item_ids can get very large, but it’s encoded on the 3 channels, so it can store 16M+ different items.
  • instance_id_to_semantic_id: is a dictionary that maps items to their categories.
1 Like