How does the Trainer API carry out fine-tuning?

Shamus · July 7, 2022, 2:35am

Hi,

When we are fine-tuning a pretrained model, using the fine-tuning method that HF provides with the Trainer API, what actually happens behind the scenes?

I am struggling to find any resources out there that describes how the Trainer API carries out fine-tuning, whether it freezes some layers, adds a new layer, or train the last few layers etc.

Here is my current understanding of this process.

Typically, when you fine-tune with HF, you load a tokenizer and a downstream task model . The model are the downstream task head that is added to the tokenizer . When you “train” with the Trainer, it keeps the tokenizer frozen (from my understanding) and you are just updating weights for the downstream task model.

Is this correct? can someone recommend some reading resources.

Topic		Replies	Views
Fine-tuning T5 with Trainer for novel task Models	1	1153	September 1, 2021
Finetuing GPT model? 🤗Transformers	2	353	August 29, 2021
Trainer output much better than output from loaded model 🤗Transformers	0	374	January 31, 2022
Anyone have idea how we can finetune a model using Trainer API? 🤗Transformers	0	446	April 22, 2022
How does the finetune on transformer (t5) work Beginners	3	1350	April 11, 2022

How does the Trainer API carry out fine-tuning?

Related topics