Pretrained Model for Fine-Tuning has 100% Trainable Parameters

bacalfa · January 15, 2025, 4:40pm

I believe I’m correctly following HuggingFace’s documentation on fine-tuning pretrained models, but I get a model with 100% trainable parameters. I thought only some layers would be unfrozen and optimized, but it looks like all of them are.

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
    )

...
# id2label and label2id represent 3 classes in my current problem

model_name = "nvidia/segformer-b5-finetuned-cityscapes-1024-1024"
model = AutoModelForSemanticSegmentation.from_pretrained(model_name, id2label=id2label, label2id=label2id, ignore_mismatched_sizes=True)
print_trainable_parameters(model)

Prints the following:

Some weights of SegformerForSemanticSegmentation were not initialized from the model checkpoint at nvidia/segformer-b5-finetuned-cityscapes-1024-1024 and are newly initialized because the shapes did not match:
- decode_head.classifier.weight: found shape torch.Size([19, 768, 1, 1]) in the checkpoint and torch.Size([3, 768, 1, 1]) in the model instantiated
- decode_head.classifier.bias: found shape torch.Size([19]) in the checkpoint and torch.Size([3]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 84595651 || all params: 84595651 || trainable%: 100.00

Why 100% trainable parameters? I could use PEFT to reduce the number of trainable parameters, but I thought that only a small subset of the parameters would be free to be optimized based on the warning message of layer decode_head.classifier.

John6666 · January 16, 2025, 1:58am

From the searches I’ve seen, it seems that many people manually freeze before training.

bacalfa · January 17, 2025, 2:16am

Mmm… then maybe all (?) documentation in HuggingFace regarding fine-tuning should be edited accordingly to clarify that simply passing id2label and/or label2id and ignore_mismatched_sizes=True to the pretrained model initialization call doesn’t freeze all layers but the ones mentioned in the warning message. That’s not obvious, at least not to me. The warning message doesn’t make it explicit that only that layer in my example will be unfrozen while all others will be frozen.

Topic		Replies	Views
Fine-tuning T5 with Trainer for novel task Models	1	1152	September 1, 2021
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	3913	August 26, 2020
How to freeze layers while fine-tuning? 🤗Transformers	2	86	May 16, 2025
Freezing mt5 model for fine-tuning Models	1	478	July 15, 2023
The point of using pretrained model if I don't freeze layers Beginners	1	8417	May 31, 2023

Pretrained Model for Fine-Tuning has 100% Trainable Parameters

Related topics