Can you fine tune fine-tuned models?

Hi, I’m new in the community. I’m reading the course and I’m a bit confused with the concept of fine tuning, model heads, and so on in transformers. I’m more familiar with convnets, so I’ll try to draw an analogy to emphasise what I don’t quite grasp.

In convnets, to fine tune a model for image classification for instance you remove the last layer of the model, put a dense at the end with some softmax functions, freeze the rest of the model, and train it again with smaller learning rate. What’s going on when you do the same to a transformer? What is exactly the “head”? Do you also freeze the rest of the model? Because I didn’t see that movement in the course.

I’ll try to be more concrete. I’m thinking of this model Recognai/selectra_medium · Hugging Face, which is an Electra model fine-tuned for Spanish. Then you have Recognai/zeroshot_selectra_medium · Hugging Face, which is fine tuned of the former for zero-shot classification. Don’t you remove the head for fine tuning? Doesn’t this get you back to the regular non-fine tuned model? Can I fine tune the zero-shot classification? What head do I remove then, and how does this not break the latter fine tuning?

Thanks a lot in advance!

1 Like

Hi @onturenio were you able to find any answers to your questions? I have the same ones! :joy:


Unfortunately not yet. I’m still learning but I cannot fully answer these questions. I’ll come back to this post if I ever end up understanding the issue :sweat_smile:

1 Like


I have a similar question: training an LLM for more than 5 epochs at a time is increasingly difficult on Kaggle or Colab.

One strategy I came up with is to train for 3 epochs at a time, and sequentially repeat this training for 5 times. Each time, we begin by fine-tuning the model from the previous training session. I would say that the underlying concept of “fine-tuning fine-tuned models” is the same as what you’re asking?

Any thoughts?

Any news on this? All the tutorials start from e.g. base llama2 for fine_tuning. Can I start from a model that’s already been finetuned on top of a base llama2?