I haven’t seen much experiments for this, but IMO it’s better to fine-tune the whole model.
Also when you pass labels
argument to T5ForConditionalGeneration
's forward
method then it calculates the loss for you and returns it as the first value in the returned tuple
.
And you can use the finetune.py
script here to fine-tuning T5 and other seq2seq models
See this thread T5 Finetuning Tips