Can I compute `eval_loss` and `bleu` score simultaneously for decoder only transformers

namespace-Pt · June 23, 2023, 10:05am

I’m trying to fine-tune LLaMA, and I want to evalute both the eval_loss and the bleu score during training, where the former needs teacher-forcing while the latter does not.

I find the Seq2SeqTrainer simultaneously executes model(**inputs).loss to compute evaluation loss and model.generate(**inputs) to compute generated tokens.

However, the inputs are different across the two function calls, as model(**inputs).loss requires the inputs to include labels while model.generate(**inputs) requires the inputs not to include labels.

Take the sentence I love you, do you as an example. I train LLaMA with I love you, as context and do you as target. When evaluating, I want to inspect:

log[p(do|I love you,)] + log[p(you|I love you, do)], i.e. eval_loss, which requires I love you, do you as input_ids
the generation results from p(*|I love you,), and computing bleu score based on the generated results. This only requires I love you, as input_ids.

How to resolve such a conflict so as to evaluate both eval_loss and bleu score in Seq2SeqTrainer.prediction_step()?

Topic		Replies	Views
Trainer.evaluate() with text generation Beginners	5	3526	December 31, 2021
Problems with trainer.compute_metrics 🤗Transformers	1	215	September 15, 2024
Seq2seq evaluation speed is slow 🤗Transformers	7	3807	June 20, 2023
Evaluating on MMLU while finetuning using Trainer 🤗Transformers	0	795	October 3, 2023
BLEU evaluation with multiple references 🤗Datasets	2	1415	July 5, 2022

Can I compute `eval_loss` and `bleu` score simultaneously for decoder only transformers

Related topics