Trainer vs seq2seqtrainer


If I am not mistaken, there are two types of trainers in the library. The standard trainer and the seq2seq trainer.

It seems that the Trainer works for every model since I am using it for a Seq2Seq model (T5).

MY question is: What advantages does seq2seq trainer have over the standard one?

And why does not the library handle the switch in the background or does it?
I mean that the user can use Trainer all the time and in the background, it will be a seq2seqtrainer if the corresponding model needs it.

Thank you!

Hi @berkayberabi

You are right, in general, Trainer can be used to train almost any library model including seq2seq.

Seq2SeqTrainer is a subclass of Trainer and provides the following additional features.

  • lets you use SortishSampler
  • lets you compute generative metrics such as BLEU, ROUGE, etc by doing generation inside the evaluation loop.

The reason to add this as a separate class is that for calculating generative metrics we need to do generation using the .generate method in the predict step which is different from how other models to prediction, to support this you need to override the prediction related methods such as (prediction_step, predict) to customize the behaviour, hence the Seq2SeqTrainer.

Hope this answers your question.


hi @valhalla

Thanks a lot for your fast reply. I understand the needs. I am using my own methods to compute the metrics and they are different the common ones. So it would not be relevant for me as far as I understand

Indeed. Also note that some of the specific features (like sortish sampling) will be integrated with Trainer at some point, so Seq2SeqTrainer is mostly about predict_with_generate.

1 Like