Trainer vs seq2seqtrainer

berkayberabi · January 12, 2021, 3:06pm

Hi,

If I am not mistaken, there are two types of trainers in the library. The standard trainer and the seq2seq trainer.

It seems that the Trainer works for every model since I am using it for a Seq2Seq model (T5).

MY question is: What advantages does seq2seq trainer have over the standard one?

And why does not the library handle the switch in the background or does it?
I mean that the user can use Trainer all the time and in the background, it will be a seq2seqtrainer if the corresponding model needs it.

Thank you!

valhalla · January 12, 2021, 3:14pm

Hi @berkayberabi

You are right, in general, Trainer can be used to train almost any library model including seq2seq.

Seq2SeqTrainer is a subclass of Trainer and provides the following additional features.

lets you use SortishSampler
lets you compute generative metrics such as BLEU, ROUGE, etc by doing generation inside the evaluation loop.

The reason to add this as a separate class is that for calculating generative metrics we need to do generation using the .generate method in the predict step which is different from how other models to prediction, to support this you need to override the prediction related methods such as (prediction_step, predict) to customize the behaviour, hence the Seq2SeqTrainer.

Hope this answers your question.

berkayberabi · January 12, 2021, 3:26pm

hi @valhalla

Thanks a lot for your fast reply. I understand the needs. I am using my own methods to compute the metrics and they are different the common ones. So it would not be relevant for me as far as I understand

sgugger · January 12, 2021, 6:29pm

Indeed. Also note that some of the specific features (like sortish sampling) will be integrated with Trainer at some point, so Seq2SeqTrainer is mostly about predict_with_generate.

deathcrush · November 15, 2024, 9:43am

or @sgugger @valhalla. Correct me if I a wrong, but one should be perfectly fine using Seq2SeqTrainer to train decoder only models if they wish to compute custom metrics which require the generated token sequences? I have just given a read to the Seq2SeqTrainer implementation in 4.35.2 and I see the custom prediction_step implementation discussed above. It seems the above should work with both types of transformers, with the only custom logic for encoder-decoder models being

        # If the `decoder_input_ids` was created from `labels`, evict the former, so that the model can freely generate
        # (otherwise, it would continue generating from the padded `decoder_input_ids`)
        if (
            "labels" in generation_inputs
            and "decoder_input_ids" in generation_inputs
            and generation_inputs["labels"].shape == generation_inputs["decoder_input_ids"].shape
        ):
            generation_inputs = {
                k: v for k, v in inputs.items() if k not in ("decoder_input_ids", "decoder_attention_mask")
            }

I think I was a bit confused in thinking that Seq2SeqTrainer is what you use to train “sequence-to-sequence” transformers (aka encoder-decoder architectures) but in fact it’s just a nifty subclass we can use to train both types of models if we wish to predict output sequences for computing sequence level metrics (eg BLEU and friends). Correct me if I’m wrong

Topic		Replies	Views
Using Seq2SeqTrainer for decoders? 🤗Transformers	0	85	December 25, 2024
How to use Seq2seq Trainer with my original "[MASK]" Beginners	2	716	October 22, 2020
Trainer.evaluate() with text generation Beginners	5	3522	December 31, 2021
The difference between Seq2SeqDataset.collate_fn and Seq2SeqDataCollator._encode Beginners	2	1303	October 24, 2020
Seq2SeqTrainer: enabled must be a bool (got NoneType) 🤗Transformers	15	3953	December 5, 2022

Trainer vs seq2seqtrainer

Related topics