Fine-tuning T5 with Trainer for novel task

Hi guys,

I am trying to fine-tune T5 with Huggingface’s Trainer class, trying to recycle as much training code as possible.

Yet I am wondering what the Trainer.train() method actually does. In the T5 paper the authors mention three fine-tuning methods that they used (§3.5.1):

  • training only additional adapter layers
  • gradually unfreezing the model and training more and more layers
  • training the whole model right away

What does the Huggingface Trainer.train() do? And is there a simple way of switching between strategies?

By default, Trainer.train() will train the entire model (i.e. all layers).

To freeze certain layers before training, you can do that as follows:

for name, param in model.named_parameters():
     if name == "...":
        param.requires_grad = False