Mask2Former not performing as expected

vmes1 · April 21, 2023, 7:10pm

Hello

I am working on a project to see how some models/architectures perform with my custom dataset for semantic segmentation. I want to train my models from scratch with no pre trained weights. I am comparing models like ResNet50, Segfromer and Mask2Former. I load all my images and masks using DataLoader. For ResNet50, I use the ResNet50 FCN provided by torchvision: fcn_resnet50 — Torchvision main documentation

So for Mask2Former from scratch. I am doing this but the results are terrible, even worse than the ResNet50 FCN. I am only using the Mask2Former model and nothing else like the ImageProcessor.

    configuration = Mask2FormerConfig(**dict(arch['args']))
    configuration.num_queries = data['num_classes']
    model = Mask2FormerForUniversalSegmentation(configuration)

for _, images, masks in dataloader:
    images = images.to(self.device, non_blocking=True)
    masks = masks.to(self.device, non_blocking=True)
    outputs = model(images,
                mask_labels=masks)
        outputs = outputs.masks_queries_logits
        outputs = nn.functional.interpolate(outputs, size=masks.shape[-2:], mode="bilinear", align_corners=False)
align_corners=False)

I also am trying with pretrained weights, but get the same results.

model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-small-ade-semantic", num_queries=data['num_classes'], ignore_mismatched_sizes=True)

I get this warning with the pretrained weights

Some weights of  were not initialized from the model checkpoint at facebook/mask2former-swin-small-ade-semantic and are newly initialized because the shapes did not match:

vmes1 · April 23, 2023, 9:19pm

Does anyone have an idea about this? Really stuck currently. Would appreciate any advice.

nielsr · April 24, 2023, 7:30am

Hi,

What’s the reason you’d like to train the model from scratch? Note that training from scratch requires quite some compute, e.g. the authors used 8 V100 GPU’s for that. It might be beneficial to just fine-tune the head layers for your custom dataset.

Refer to these tutorial notebooks regarding fine-tuning: Transformers-Tutorials/MaskFormer at master · NielsRogge/Transformers-Tutorials · GitHub (Mask2Former fine-tuning is identical to MaskFormer fine-tuning).

vmes1 · April 24, 2023, 3:10pm

Hello @nielsr,

Thanks for the response.

As mentioned above, I am comparing several different architectures. I am training for scratch because not all of these models have pre-trained weights available. Also, I’ve read that pretrained weights and fineturning is only very effective when the datasets are in the same domain. But my dataset is not similar to the pretrained ade dataset.

Regardless, I have tried to implement the fine-tuning tutorial that you linked with my dataset, but have a few questions.

Loss function: Mask2Former returns the Mask2FormerForUniversalSegmentationOutput as output. It has its own loss with it. How does it calculate the loss? I want to use my own loss function (Dice Loss) to calculate the loss and use it for the backpropagation. How can I get the raw logits in the form of (batch, num_classes, height, width) to input into my loss function?
Metrics: One weird thing that happens with the metrics defined in the tutorial is that when it calculated the ioupercategory, my iou for my background class is always 0. I want it to include my background class (label = 0) in the iou calculation.

rbavery · May 3, 2023, 7:37pm

I don’t think num_queries should be num_classes, I think num_queries refers to the number of objects to detect. you can see these are separate things in the post_process_instance_segmentation functionfor Mask2Former

rbavery · May 3, 2023, 7:43pm

also maybe check out the reduce_labels arg to handle background correctly? I’m trying this out curious if this solve syour iou problem: Train a MaskFormer Segmentation Model with Hugging Face Transformers - PyImageSearch

rbavery · May 3, 2023, 7:49pm

following the tutorial @nielsr linked, you also need to set ignore_index to 0 in the processor so that the background class isn’t picked up as an object class

manuCeron96 · June 30, 2023, 9:29am

Hi @rbavery @nielsr

Is it clear for you how to use ignore_index, and reduce_labels arguments?

I’m trying to do binary segmentation using Mask2FormerForUniversalSegmentation.from_pretrained, and I’m also following exactly the Tutorial you already mention.

I have a single class and I’m not able to make the model converge. Does Mask2Former support binary segmentation, or should treat this as 2-classes segmentation (background and my class). If fo, should I consider something special in the configuration, eg. ignore_index, reduce_labels?

Appreaciate your help!

beschmitt · July 22, 2024, 7:47am

Hello @manuCeron96 ,
have you figured out the correct setup for binary segmentation?

I have my image_processor set to num_labels = 2 and ignore_index = 0 since index is my background-label and 1 is my object-label

My model is set to num_labels = 2 and ignore_mismatched_sizes = True

The fine-tuning starts out promising but every time the accuracy starts converging to 0 and the model then only predicts null-vectors. I’d love to have a clearer tutorial about how to set up the different parameters for the processor and model for specific use-cases.

Topic		Replies	Views
Using huggingface models without any other huggingface support? Beginners	2	507	April 21, 2023
Mask2former setup for binary segmentation Beginners	6	576	August 5, 2024
Loading only pre-trained backbone for Mask2Former 🤗Transformers	0	207	April 8, 2024
Maskformer loss , finetuning with weighted loss Models	0	272	January 27, 2024
Mask2former with low rank adaptation setup for binary segmentation Models	0	58	July 25, 2024

Mask2Former not performing as expected

Related topics