Issue with finetuning a seq-to-seq model

valhalla · October 22, 2020, 2:50pm

Here’ what I think

The default task for finetune.py is summarization and it uses the generate parameters for summrization tasks, which are not useful here.
142 eval_max_gen_length seems too large for QA task, should be lower IMO
using beam_search might not give good results for QA, in the T5 paper they used greedy decoding for QA.

When using generate it could be using summarization generate parameters which could explain the longer answers.
Try using greedy decoding with generate, set num_beams to 1, smaller max_length (32 should enough, for SQuAD 16 is fine), and 0 length_penalty

LMK if this helps

Topic		Replies	Views
Problem fine-tuning a model with Seq2Seq Trainer Beginners	1	995	June 25, 2023
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	629	October 14, 2020
How To Output "test_generations.txt" with run_seq2seq.py? Beginners	5	746	March 9, 2021
Issues running seq2seq distillation 🤗Transformers	4	862	January 11, 2021
Fine-tuning seq2seq: Helsinki-NLP 🤗Transformers	4	2270	December 8, 2020

Issue with finetuning a seq-to-seq model

Related topics