Hi, I’m new in the community. I’m reading the course and I’m a bit confused with the concept of fine tuning, model heads, and so on in transformers. I’m more familiar with convnets, so I’ll try to draw an analogy to emphasise what I don’t quite grasp.
In convnets, to fine tune a model for image classification for instance you remove the last layer of the model, put a dense at the end with some softmax functions, freeze the rest of the model, and train it again with smaller learning rate. What’s going on when you do the same to a transformer? What is exactly the “head”? Do you also freeze the rest of the model? Because I didn’t see that movement in the course.
I’ll try to be more concrete. I’m thinking of this model Recognai/selectra_medium · Hugging Face, which is an Electra model fine-tuned for Spanish. Then you have Recognai/zeroshot_selectra_medium · Hugging Face, which is fine tuned of the former for zero-shot classification. Don’t you remove the head for fine tuning? Doesn’t this get you back to the regular non-fine tuned model? Can I fine tune the zero-shot classification? What head do I remove then, and how does this not break the latter fine tuning?
Thanks a lot in advance!