I’m trying to fine-tune LLaMA, and I want to evalute both the
eval_loss and the
bleu score during training, where the former needs teacher-forcing while the latter does not.
I find the
Seq2SeqTrainer simultaneously executes
model(**inputs).loss to compute evaluation loss and
model.generate(**inputs) to compute generated tokens.
inputs are different across the two function calls, as
model(**inputs).loss requires the inputs to include
model.generate(**inputs) requires the inputs not to include
Take the sentence
I love you, do you as an example. I train LLaMA with
I love you, as context and
do you as target. When evaluating, I want to inspect:
log[p(do|I love you,)] + log[p(you|I love you, do)], i.e.
eval_loss, which requires
I love you, do youas
- the generation results from
p(*|I love you,), and computing
bleuscore based on the generated results. This only requires
I love you,as
How to resolve such a conflict so as to evaluate both
bleu score in