Finetuning MobileVITV2 for Semantic Segmentation

Hi, what is the proper way to change the number of classes in the final layer when finetuning MobileVITV2 for semantic segmentation? When I specified the num_classes parameter when loading the model from pretrained, I kept getting an error that a label index (16) was out of bounds during computing cross entropy loss.

model = AutoModelForSemanticSegmentation.from_pretrained(checkpoint, id2label=id2label, label2id=label2id, num_classes=15)

I then tried keeping the original pretrained number of classes, even the number of classes for the new task was smaller, this error went away, although the accuracy and IOU predictions for just the extra classes were NAN.


One needs to specify ignore_mismatched_sizes=True to replace the classification head of an already fine-tuned model. The recommended way is to do:

from transformers import AutoModelForSemanticSegmentation

repo_id = "apple/mobilevitv2-1.5-voc-deeplabv3"

id2label = {0: "bird", 1: "car"}
label2id = {v:k for k,v in id2label.items()}

model = AutoModelForSemanticSegmentation.from_pretrained(repo_id, id2label=id2label, label2id=label2id, ignore_mismatched_sizes=True)

This will raise a warning, specifying which layers will be randomly initialized:

Some weights of MobileViTV2ForSemanticSegmentation were not initialized from the model checkpoint at apple/mobilevitv2-1.5-voc-deeplabv3 and are newly initialized because the shapes did not match:
- segmentation_head.classifier.convolution.weight: found shape torch.Size([21, 512, 1, 1]) in the checkpoint and torch.Size([2, 512, 1, 1]) in the model instantiated
- segmentation_head.classifier.convolution.bias: found shape torch.Size([21]) in the checkpoint and torch.Size([2]) in the model instantiated
1 Like