Does anyone know what happens under the hood when we finetune a pretrained model (eg. BERT) via huggingface transformers?
I know there are multiple ways to fine-tune, such as freezing the first layers and training a fully connected layer at the end of the model.
Which approach is used by the Huggingface Transformers library? I am writing a paper and my experiments involve fine-tuning models from huggingface and I would like to know about the specifics of finetuning so I can write about it in my paper.
Thank you