Hi everyone, I’m fine-tuning XLNet for generation. For training, I’ve edited the permutation_mask to predict the target sequence one word at a time. I’m evaluating my trained model and am trying to decide between trainer.evaluate() and model.generate(). Running the same input/model with both methods yields different predicted tokens. Is it correct that trainer.evaluate() is not set up for sequential generation? I’ll switch my evaluation code to use model.generate() if that’s the case. Thanks for the help!
Hi, I encountered a similar problem when trying to use EncoderDecoderModel for seq2seq tasks. It seems like Trainer does not support text-generation tasks for now, as their website https://huggingface.co/transformers/examples.html shows.
There’s a PR for that, you can try to use it.
Is there any update regarding this topic?
I would like to train a
VisionEncoderDecoderModel for image captioning and measure the BLEU metrics during evaluation. The
EvalPrediction object I get in
compute_metrics just contains the logits, not the generated texts or tokens (i.e. the result of a beam search). I would assume that the computation of metrics on the result of
generate is not uncommon.
The PR mentioned in this thread seems to be stale and there have been quite some changes to
Trainer since it was proposed.
Hi @cgawron, you can take a look at my TrOCR notebooks here: Transformers-Tutorials/TrOCR at master · NielsRogge/Transformers-Tutorials · GitHub.
They include several example notebooks regarding fine-tuning TrOCR (which is an instance of
VisionEncoderDecoderModel). I have a notebook regarding using the Seq2SeqTrainer, but also using native PyTorch. In both cases, I illustrate how to compute metrics using
generate (in the notebooks I use CER, but you can easily replace it with something like ROUGE or BLEU).