Issue with finetuning a seq-to-seq model

Here’ what I think

  1. The default task for finetune.py is summarization and it uses the generate parameters for summrization tasks, which are not useful here.
  2. 142 eval_max_gen_length seems too large for QA task, should be lower IMO
  3. using beam_search might not give good results for QA, in the T5 paper they used greedy decoding for QA.

When using generate it could be using summarization generate parameters which could explain the longer answers.
Try using greedy decoding with generate, set num_beams to 1, smaller max_length (32 should enough, for SQuAD 16 is fine), and 0 length_penalty

LMK if this helps

1 Like