Custom dataset maskformer

I decided to use do_reduce_labels=True together with ignore_index=255, following this discussion about a similar case:

Additionally, I found a similar scenario in a tutorial for fine-tuning Mask2Former, where the config.json file also has do_reduce_labels=True. According to the documentation, the preprocessing for MaskFormer and Mask2Former should be identical.

In the same file:

# We need to specify the label2id mapping for the model
# it is a mapping from semantic class name to class index.
# In case your dataset does not provide it, you can create it manually:
# label2id = {"background": 0, "cat": 1, "dog": 2}
label2id = dataset["train"][0]["semantic_class_to_id"]

if args.do_reduce_labels:
    label2id = {name: idx for name, idx in label2id.items() if idx != 0}  # remove background class
    label2id = {name: idx - 1 for name, idx in label2id.items()}  # shift class indices by -1

From what I gather (though I’m not entirely sure, as the documentation isn’t very clear), when you don’t want to consider the background as a segmentable class, the preprocessor replaces the background in the image with the value 255. This value is ignored during loss computation.

Thus, I set the parameter as follows. I don’t think the ignore_index value can be arbitrarily set (e.g., if I have {0: 'garden', 1: 'car', 2: 'tree'} and set ignore_index=1, the ‘car’ class will be ignored during loss computation).

The parameter do_reduce_labels=True ensures that classes start from 0 and increment upward, which is why they are shifted by -1.

Example (Models trained with 20 epochs and learning rate 5e-5)

Test Image:

Preprocessor for MaskFormer:

self.processor = AutoImageProcessor.from_pretrained(

Results with MaskFormer:

       Test metric             DataLoader 0
        test_loss           1.0081120729446411
        test_map           0.038004860281944275
       test_map_50          0.06367719173431396
       test_map_75         0.040859635919332504
     test_map_large         0.5004204511642456
     test_map_medium        0.04175732284784317
     test_map_small        0.007470746990293264
       test_mar_1           0.01011560671031475
       test_mar_10          0.05838150158524513
      test_mar_100          0.06329479813575745

Test Image Result with MaskFormer:

Preprocessor for Mask2Former:

self.id2label = {0: "unhealty"}
self.label2id = {v: int(k) for k, v in self.id2label.items()}
self.processor = AutoImageProcessor.from_pretrained(

Results with Mask2Former:

       Test metric             DataLoader 0
        test_loss           15.374979972839355
        test_map            0.44928184151649475
       test_map_50          0.6224347949028015
       test_map_75          0.5011898279190063
     test_map_large         0.8390558958053589
     test_map_medium        0.6270320415496826
     test_map_small         0.32075226306915283
       test_mar_1           0.03526011481881142
       test_mar_10          0.24104046821594238
      test_mar_100          0.5274566411972046

Test Image Result with Mask2Former:

As you can see, the results are very different, even though the code is identical, except for the parts where the model type is changed. if you want I can share the code.