Which weights change when fine-tunning a pre-trained model?

laurafuentesq · June 10, 2024, 2:10pm

Hello!

I am using the AutoModelForSequenceClassification to fine-tune a pre-trained model (which originally is based on GPT2 acrhitecture)

model = AutoModelForSequenceClassification.from_pretrained("Natooz/Maestro-REMI-bpe20k", trust_remote_code=True, torch_dtype="auto",num_labels=2)

And I am using the basic trainning loop suggested on the NLP course from hugging face:

model.train()
for epoch in range(num_epochs):
    for batch in train_dl:
        outputs = model(**batch)
        loss = outputs.loss
        accelerator.backward(loss)

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

My question is:

1. Which weights of the model are going to change after using this trainning loop? Am I trainning all the weights of the model? Only the classification head added by the AutoModelForSequenceClassification?

It wouldn’t make a lot of sense to me for all weights to change, because in theory i wouldn’t be “tranfering” any knowledge from my pre-trained model to my task.

2. Would it work the same if I fine-tune my pre-trained model with the trainer class instead of the trainning loop?

Thank you so much in advance!

nielsr · June 10, 2024, 2:25pm

Hi,

1. Which weights of the model are going to change after using this trainning loop? Am I trainning all the weights of the model? Only the classification head added by the AutoModelForSequenceClassification?*

By default, all weights are updated (full fine-tuning), including the classification head. You can see which parameters will get updated by printing this:

for name, param in model.named_parameters():
     print(name, param.requires_grad)

Any param for which this prints True means that gradients are computed and they will get updated.

2. Would it work the same if I fine-tune my pre-trained model with the trainer class instead of the trainning loop?*

Yes, the Trainer API is for people who don’t want to write their own training loop basically. It makes sure people can fine-tune models in the Transformers library in an easy way. If you prefer to write your own training loop, then we recommend Accelerate which takes care of device placements for you.

laurafuentesq · June 11, 2024, 4:00pm

Thank you so much Niels for your answer. Two additional things:

How am I "transfering learning " if I am recomputing all the weights all over again ? What is the difference between that and trainning the model from scratch ? I thought the goal of fine-tunning was saving time and resources by using already computed weights.

2.I have a classification task, and this is my model :

My dataset is not that big. Do you have any suggestions on which layers to freeze and how to do it?

nielsr · June 11, 2024, 5:32pm

We’re only slightly adjusting the weights (the size is determined by the learning rate), rather than recomputing them from scratch.

My dataset is not that big. Do you have any suggestions on which layers to freeze and how to do it?

Usually we recommend to perform full fine-tuning (updating all the weights). However for larger models, we have a new library called PEFT which includes Parameter Efficient Fine-Tuning techniques such as LoRa (this means freezing the entire model and only training a couple of adapter layers on top).

Topic		Replies	Views
Training a pre-trained model and fine-tuning it later Beginners	0	337	March 11, 2024
Loading pretrained weights into model for sequence classifcation Beginners	2	484	July 22, 2020
Properly loading a fine tuned model from directory Intermediate	2	2050	August 25, 2020
Embeddings from fine-tuned ModelForSequenceClassification 🤗Transformers	0	64	August 9, 2024
Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification Beginners	3	944	December 10, 2020

Which weights change when fine-tunning a pre-trained model?

Related topics