I have previously posted this topic in the Beginner category but I feel that it fits more as a general model question. Also there is new information and I am not able to edit questions yet.
Hello,
I am trying to fine-tune a mask2former model for a binary task where 0 is the background and 1 is my object. I am initializing the processor and model in the following way:
from transformers import Mask2FormerImageProcessor, Mask2FormerForUniversalSegmentation
IMAGE_PROCESSOR = Mask2FormerImageProcessor.from_pretrained("facebook/mask2former-swin-base-IN21k-ade-semantic"
, do_rescale = False, do_normalize = True
, do_resize = False
, num_labels = 2, ignore_index = 0
)
MODEL = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-IN21k-ade-semantic", num_labels = 2, ignore_mismatched_sizes = True)
I have also implemented low-rank adaptation to account for catastrophic forgetting:
from peft import LoraConfig, get_peft_model
target_modules = ['q_proj','k_proj','v_proj','out_proj', 'class_predictor']
config = LoraConfig(
r = 8
, lora_alpha = 16
, target_modules = target_modules
, lora_dropout = 0.3
, bias = "lora_only"
, init_lora_weights = "pissa"
, use_rslora = True
, modules_to_save = ['decode_head']
)
LORA_MODEL = get_peft_model(MODEL, config)
OPTIMIZER = torch.optim.AdamW(LORA_MODEL.parameters(), lr = 0.00001)
My dataset randomly rotates and flips my training and validation data and it also takes a RandomCrop of the image and masks. This setup has already worked for training a LRASPP-model from the ground up.
However every time I try to train the mask2former-model after a few epochs the accuracy converges to 0 and it starts outputting only null-tensors.
Is something with my initialization wrong or not? I have already looked for different threads about this but there aren’t many. On Huggingface there is one from a year ago but this was never answered.