I am using the AutoModelForSequenceClassification to fine-tune a pre-trained model (which originally is based on GPT2 acrhitecture)
model = AutoModelForSequenceClassification.from_pretrained("Natooz/Maestro-REMI-bpe20k", trust_remote_code=True, torch_dtype="auto",num_labels=2)
And I am using the basic trainning loop suggested on the NLP course from hugging face:
model.train()
for epoch in range(num_epochs):
for batch in train_dl:
outputs = model(**batch)
loss = outputs.loss
accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
My question is:
1. Which weights of the model are going to change after using this trainning loop? Am I trainning all the weights of the model? Only the classification head added by the AutoModelForSequenceClassification?
It wouldn’t make a lot of sense to me for all weights to change, because in theory i wouldn’t be “tranfering” any knowledge from my pre-trained model to my task.
2. Would it work the same if I fine-tune my pre-trained model with the trainer class instead of the trainning loop?
1. Which weights of the model are going to change after using this trainning loop? Am I trainning all the weights of the model? Only the classification head added by the AutoModelForSequenceClassification?*
By default, all weights are updated (full fine-tuning), including the classification head. You can see which parameters will get updated by printing this:
for name, param in model.named_parameters():
print(name, param.requires_grad)
Any param for which this prints True means that gradients are computed and they will get updated.
2. Would it work the same if I fine-tune my pre-trained model with the trainer class instead of the trainning loop?*
Yes, the Trainer API is for people who don’t want to write their own training loop basically. It makes sure people can fine-tune models in the Transformers library in an easy way. If you prefer to write your own training loop, then we recommend Accelerate which takes care of device placements for you.
Thank you so much Niels for your answer. Two additional things:
How am I "transfering learning " if I am recomputing all the weights all over again ? What is the difference between that and trainning the model from scratch ? I thought the goal of fine-tunning was saving time and resources by using already computed weights.
2.I have a classification task, and this is my model :
We’re only slightly adjusting the weights (the size is determined by the learning rate), rather than recomputing them from scratch.
My dataset is not that big. Do you have any suggestions on which layers to freeze and how to do it?
Usually we recommend to perform full fine-tuning (updating all the weights). However for larger models, we have a new library called PEFT which includes Parameter Efficient Fine-Tuning techniques such as LoRa (this means freezing the entire model and only training a couple of adapter layers on top).