I am trying to fine-tune T5 with Huggingface’s Trainer class, trying to recycle as much training code as possible.
Yet I am wondering what the
Trainer.train() method actually does. In the T5 paper the authors mention three fine-tuning methods that they used (§3.5.1):
- training only additional adapter layers
- gradually unfreezing the model and training more and more layers
- training the whole model right away
What does the Huggingface
Trainer.train() do? And is there a simple way of switching between strategies?