Do you train all layers when fine-tuning T5?

I haven’t seen much experiments for this, but IMO it’s better to fine-tune the whole model.

Also when you pass labels argument to T5ForConditionalGeneration's forward method then it calculates the loss for you and returns it as the first value in the returned tuple .

And you can use the finetune.py script here to fine-tuning T5 and other seq2seq models

See this thread T5 Finetuning Tips

2 Likes