Mask2Former IoU a lot worse than Maskformer's IoU on same dataset

Hi everybody,

I followed this tutorial here to train Maskformer on my own custom dataset and evaluate the model using the mIoU metric score. It worked all pretty well, but when I replaced MaskFormer by Mask2Former, some things got worse:

  • The loss with Mask2Former is a lot higher than with Maskformer. Here is the Mask2Former’s loss output for the first training epoch:
    Training loss:  79.512253
    Training loss:  37.162781
    Training loss:  30.328207
    Training loss:  27.043933
    Training loss:  24.621095
    Training loss:  22.606757
    Training loss:  20.985848
    Training loss:  19.564922
    Training loss:  18.618953
    Training loss:  17.590006
    Training loss:  16.654683
    Training loss:  15.866917
    Training loss:  15.126552
    
    compared to Maskformer’s loss:
    Training loss:  335.591675
    Training loss:  8.424562
    Training loss:  4.971408
    Training loss:  3.780433
    Training loss:  3.152624
    Training loss:  2.738412
    Training loss:  2.435367
    Training loss:  2.196829
    Training loss:  1.996039
    Training loss:  1.824578
    Training loss:  1.676588
    Training loss:  1.550934
    Training loss:  1.444919
    
  • The mIoU, Mean Accuracy and overall Accuracy are a lot worse with Mask2Former. Here are the metrics for Mask2Former:
    Mean IoU: 0.0139 | Mean Accuracy: 0.0323 | Overall Accuracy: 0.0350
    
    compared to Maskformer’s metrics:
    Mean IoU: 0.6466 | Mean Accuracy: 0.9391 | Overall Accuracy: 0.9352
    

As far as I can say, the results on inference don’t look bad on Mask2Former - so I really can’t understand this difference in the metrics.

Concerning the code, all I did was replacing

MaskFormerConfig,
MaskFormerImageProcessor,
MaskFormerModel,
MaskFormerForInstanceSegmentation

with

Mask2FormerConfig,
Mask2FormerImageProcessor,
Mask2FormerModel,
Mask2FormerForUniversalSegmentation

and

config = MaskFormerConfig.from_pretrained("facebook/maskformer-swin-base-coco")

with

config = Mask2FormerConfig.from_pretrained("facebook/mask2former-swin-base-coco-instance")

Also, training with Mask2Former seems to be a lot slower than with MaskFormer.
Any suggestions about what I might have done wrong or misunderstood?
Any help is appreciated, thanks a lot.