Fine-tuning T5 with Trainer for novel task

jdoe · April 14, 2021, 4:34pm

Hi guys,

I am trying to fine-tune T5 with Huggingface’s Trainer class, trying to recycle as much training code as possible.

Yet I am wondering what the Trainer.train() method actually does. In the T5 paper the authors mention three fine-tuning methods that they used (§3.5.1):

training only additional adapter layers
gradually unfreezing the model and training more and more layers
training the whole model right away

What does the Huggingface Trainer.train() do? And is there a simple way of switching between strategies?

nielsr · September 1, 2021, 2:00pm

By default, Trainer.train() will train the entire model (i.e. all layers).

To freeze certain layers before training, you can do that as follows:

for name, param in model.named_parameters():
     if name == "...":
        param.requires_grad = False

Topic		Replies	Views
Trainer.train() seems to finish almost instantly 🤗Transformers	0	520	September 29, 2023
Do you train all layers when fine-tuning T5? Beginners	7	6986	September 26, 2023
Gradual Layer Freezing 🤗Transformers	6	4637	November 28, 2022
How is T5 pretrained? 🤗Transformers	3	510	July 12, 2021
\multi-node finetuning with Trainer 🤗Transformers	0	478	July 27, 2022

Fine-tuning T5 with Trainer for novel task

Related topics