BLEU evaluation with multiple references

sunhaozhepy · June 12, 2022, 12:17pm

Hi,

I’m trying to train a T5 model on a seq2seq task. The dataset has multiple ground truths for the generation; I split the references to get more training data, and I want to validate and test with all references to calculate the BLEU score, and for validation I want to save the model with the highest BLEU score calculated on the validation set. Now this has two problems:

the common DataCollatorForSeq2Seq can’t deal with that because the label is 3-dimensional: the following is an example:)

'references': [[tensor([3613,    8, 4963,   13,    8, 4033,    1]),
   tensor([  320,    21,     3,     9,  9717,  2195, 17041,     1]),
   tensor([  661,   550,    45,     8,     3, 25895,  3797,     1])],
  [tensor([   34,   808,     3, 13287,  5600,    11,     3,    29,  6833,    81,
             460,   676,    12,   129,    12,  1455,     5,     1]),
   tensor([  34,   47, 3412,   53, 7501,  116,    3,   29, 6833, 3030,   12,  129,
             95,    5,    1]),
   tensor([   8, 1282, 1969,  263,   34, 1256,   21, 9635,   52,    7,   12,  253,
              3,   29, 6833,    5,    1])],
  [tensor([   79,   261, 16352,     7,    12,   199,  2331,  9321,     7,     5,
               1]),
   tensor([  79, 2139, 7208, 7479,   70, 9321,    7,    5,    1]),
   tensor([  79,  356,   95,   46, 1470,  718, 1131, 2269,    5,    1])]]}

I don’t know how to use this configuration in the Trainer API: is there a way not to calculate the validation loss, and only calculate the BLEU score?

lhoestq · July 4, 2022, 2:09pm

Hi ! Feel free to questions about the Trainer API in the 🤗Transformers section

Have you tried taking a look at the compute_metrics argument of the Seq2SeqTrainer ?

sunhaozhepy · July 5, 2022, 7:55am

Hi,

Thanks for your reply! Basically I’ve decided not to evaluate during training, but to evaluate my model after training on each checkpoint.

Topic		Replies	Views
Seq2SeqTrainer produces error during validation when using T5 🤗Transformers	0	137	March 18, 2024
Can I compute `eval_loss` and `bleu` score simultaneously for decoder only transformers 🤗Transformers	0	437	June 23, 2023
Compute the BLEU using pretrained T5-small Models	2	3987	April 13, 2022
How to accessing the input_ids in EvalPrediction.predictions in Seq2SeqTrainer? 🤗Transformers	5	2237	November 25, 2022
Seq2seq evaluation speed is slow 🤗Transformers	7	3813	June 20, 2023

BLEU evaluation with multiple references

Related topics