Mask2former setup for binary segmentation

beschmitt · July 23, 2024, 9:55am

Hello,

I am trying to fine-tune a mask2former model for a binary task where 0 is the background and 1 is my object. I am initializing the processor and model in the following way:

IMAGE_PROCESSOR = Mask2FormerImageProcessor.from_pretrained("facebook/mask2former-swin-base-IN21k-ade-semantic"
                        , do_rescale = False, do_normalize = True
                        , do_resize = False
                        , num_labels = 2, ignore_index = 0
                        )

MODEL = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-IN21k-ade-semantic", num_labels = 2, ignore_mismatched_sizes = True)

My dataset randomly rotates and flips my training and validation data and it also takes a RandomCrop of the image and masks. This setup has already worked for training a LRASPP-model from the ground up.
However every time I try to train the mask2former-model after a few epochs the accuracy converges to 0 and it starts outputting only null-tensors.
Is something with my initialization wrong or not? I have already looked for different threads about this but there aren’t many. On Huggingface there is one from a year ago but this was never answered.

I will try if implementing something that guarantees that the input will never be a null-tensor is going to improve it but at this point I am not very hopeful.

beschmitt · July 25, 2024, 11:01am

I posted this problem with more information in the “Models”-category since it fit better in there. This thread can be closed.

qubvel-hf · July 25, 2024, 4:23pm

Hi @beschmitt, did you try to train using do_reduce_labels=True + ignore_index=255 instead?

beschmitt · July 26, 2024, 6:27am

I have tried using do_reduce_labels at one point but that lead to it always returning a null-tensor.
I set ignore_index = 0 because my background is labeled as 0 and the object I want to detect is set to 1. When I read them in the background is 0 and the object is 255 but I am dividing the mask-tensor by 255 to get it to 0 and 1 because I had trouble with other models before without doing this.

I will try your suggestion though and post my answer.

qubvel-hf · July 29, 2024, 7:57am

Try following this code snippet.

Please pay attention that your binary mask should contain 255 as background and do_reduce_labels is set to False.

import requests
import torch
import numpy as np
from PIL import Image
from transformers import Mask2FormerForUniversalSegmentation, Mask2FormerImageProcessor


# load Mask2Former fine-tuned on Mapillary Vistas semantic segmentation
processor: Mask2FormerImageProcessor = Mask2FormerImageProcessor.from_pretrained(
    "facebook/mask2former-swin-tiny-coco-instance",
    do_reduce_labels=False,
    ignore_index=255,
)

id2label = {
    0: "cat",  # relevant classes ids must start from 0
    255: "background",
}
label2id = {v: k for k, v in id2label.items()}

model = Mask2FormerForUniversalSegmentation.from_pretrained(
    "facebook/mask2former-swin-large-mapillary-vistas-semantic",
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
w, h = image.size

# create a dummy instance map with two instances, instance ids are "5" and "10"
instances_map = np.full((h, w), 255, dtype=np.uint8)
instances_map[: h // 2, :] = 5
instances_map[h // 2:, :] = 10

# map instance ids to semantic ids,
instance_id_to_semantic_id = {5: 0, 10: 0}

# prepare inputs, image processor will create two binary masks, one for each instance
inputs = processor(images=[image], segmentation_maps=[instances_map], instance_id_to_semantic_id=instance_id_to_semantic_id, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

Instance segmentation example:

github.com

huggingface/transformers/blob/main/examples/pytorch/instance-segmentation/run_instance_segmentation.py

#!/usr/bin/env python
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and

"""Finetuning 🤗 Transformers model for instance segmentation leveraging the Trainer API."""

import logging
import os
import sys

This file has been truncated. show original

In case you just need semantic segmentation (not instance) I would recommend looking at the Segformer example

github.com

huggingface/transformers/blob/main/examples/pytorch/semantic-segmentation/run_semantic_segmentation.py

#!/usr/bin/env python
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and

import json
import logging
import os
import sys
import warnings

This file has been truncated. show original

beschmitt · July 30, 2024, 9:16am

I have set it up so that the background is 255 and the object is 0. ignore_index is set to 255 and reduce_labels is set to False. I even implemented it so that every randomcrop has at least several thousand pixels of the object inside it to prevent it from learning empty tensors and it is still not working.

qubvel-hf · August 5, 2024, 7:47am

Hi @beschmitt
The above example I provided seems to work without an error, however, you can avoid using an image processor if it doesn’t work for you and just prepare model input by yourself, just make sure it is in the same format model expected.

First, you can take a look at the officially provided example (see the links above), then prepare your input to be in the same format.

Topic		Replies	Views
Mask2former with low rank adaptation setup for binary segmentation Models	0	53	July 25, 2024
Mask2Former not performing as expected 🤗Transformers	8	2376	July 22, 2024
Mask2Former for Binary segmentation 🤗Transformers	1	518	July 9, 2024
Maskformer loss , finetuning with weighted loss Models	0	261	January 27, 2024
How do you use segmentation image processor with more than 3 channel images? Beginners	1	294	May 13, 2024

Mask2former setup for binary segmentation

Related topics