Trainer vs seq2seqtrainer

Hi,

If I am not mistaken, there are two types of trainers in the library. The standard trainer and the seq2seq trainer.

It seems that the Trainer works for every model since I am using it for a Seq2Seq model (T5).

MY question is: What advantages does seq2seq trainer have over the standard one?

And why does not the library handle the switch in the background or does it?
I mean that the user can use Trainer all the time and in the background, it will be a seq2seqtrainer if the corresponding model needs it.

Thank you!

7 Likes

Hi @berkayberabi

You are right, in general, Trainer can be used to train almost any library model including seq2seq.

Seq2SeqTrainer is a subclass of Trainer and provides the following additional features.

  • lets you use SortishSampler
  • lets you compute generative metrics such as BLEU, ROUGE, etc by doing generation inside the evaluation loop.

The reason to add this as a separate class is that for calculating generative metrics we need to do generation using the .generate method in the predict step which is different from how other models to prediction, to support this you need to override the prediction related methods such as (prediction_step, predict) to customize the behaviour, hence the Seq2SeqTrainer.

Hope this answers your question.

9 Likes

hi @valhalla

Thanks a lot for your fast reply. I understand the needs. I am using my own methods to compute the metrics and they are different the common ones. So it would not be relevant for me as far as I understand

1 Like

Indeed. Also note that some of the specific features (like sortish sampling) will be integrated with Trainer at some point, so Seq2SeqTrainer is mostly about predict_with_generate.

6 Likes

or @sgugger @valhalla. Correct me if I a wrong, but one should be perfectly fine using Seq2SeqTrainer to train decoder only models if they wish to compute custom metrics which require the generated token sequences? I have just given a read to the Seq2SeqTrainer implementation in 4.35.2 and I see the custom prediction_step implementation discussed above. It seems the above should work with both types of transformers, with the only custom logic for encoder-decoder models being

        # If the `decoder_input_ids` was created from `labels`, evict the former, so that the model can freely generate
        # (otherwise, it would continue generating from the padded `decoder_input_ids`)
        if (
            "labels" in generation_inputs
            and "decoder_input_ids" in generation_inputs
            and generation_inputs["labels"].shape == generation_inputs["decoder_input_ids"].shape
        ):
            generation_inputs = {
                k: v for k, v in inputs.items() if k not in ("decoder_input_ids", "decoder_attention_mask")
            }

I think I was a bit confused in thinking that Seq2SeqTrainer is what you use to train “sequence-to-sequence” transformers (aka encoder-decoder architectures) but in fact it’s just a nifty subclass we can use to train both types of models if we wish to predict output sequences for computing sequence level metrics (eg BLEU and friends). Correct me if I’m wrong :slight_smile:

1 Like