Looking into fine-tuning UMT5 (encoder-decoder) model for a translation task.
Has anyone explored end task’s performance difference when finetuning such (unsupervised trained encoder-decoder) model in
- whole model finetuning setting
- decoder only finetuning