Pretrained Model for Fine-Tuning has 100% Trainable Parameters

I believe I’m correctly following HuggingFace’s documentation on fine-tuning pretrained models, but I get a model with 100% trainable parameters. I thought only some layers would be unfrozen and optimized, but it looks like all of them are.

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
    )

...
# id2label and label2id represent 3 classes in my current problem

model_name = "nvidia/segformer-b5-finetuned-cityscapes-1024-1024"
model = AutoModelForSemanticSegmentation.from_pretrained(model_name, id2label=id2label, label2id=label2id, ignore_mismatched_sizes=True)
print_trainable_parameters(model)

Prints the following:

Some weights of SegformerForSemanticSegmentation were not initialized from the model checkpoint at nvidia/segformer-b5-finetuned-cityscapes-1024-1024 and are newly initialized because the shapes did not match:
- decode_head.classifier.weight: found shape torch.Size([19, 768, 1, 1]) in the checkpoint and torch.Size([3, 768, 1, 1]) in the model instantiated
- decode_head.classifier.bias: found shape torch.Size([19]) in the checkpoint and torch.Size([3]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 84595651 || all params: 84595651 || trainable%: 100.00

Why 100% trainable parameters? I could use PEFT to reduce the number of trainable parameters, but I thought that only a small subset of the parameters would be free to be optimized based on the warning message of layer decode_head.classifier.

1 Like

From the searches I’ve seen, it seems that many people manually freeze before training.

Mmm… then maybe all (?) documentation in HuggingFace regarding fine-tuning should be edited accordingly to clarify that simply passing id2label and/or label2id and ignore_mismatched_sizes=True to the pretrained model initialization call doesn’t freeze all layers but the ones mentioned in the warning message. That’s not obvious, at least not to me. The warning message doesn’t make it explicit that only that layer in my example will be unfrozen while all others will be frozen.

1 Like