Hi,
If I am not mistaken, there are two types of trainers in the library. The standard trainer and the seq2seq trainer.
It seems that the Trainer works for every model since I am using it for a Seq2Seq model (T5).
MY question is: What advantages does seq2seq trainer have over the standard one?
And why does not the library handle the switch in the background or does it?
I mean that the user can use Trainer all the time and in the background, it will be a seq2seqtrainer if the corresponding model needs it.
Thank you!
7 Likes
Hi @berkayberabi
You are right, in general, Trainer
can be used to train almost any library model including seq2seq.
Seq2SeqTrainer
is a subclass of Trainer
and provides the following additional features.
- lets you use
SortishSampler
- lets you compute generative metrics such as BLEU, ROUGE, etc by doing generation inside the evaluation loop.
The reason to add this as a separate class is that for calculating generative metrics we need to do generation using the .generate
method in the predict step which is different from how other models to prediction, to support this you need to override the prediction related methods such as (prediction_step
, predict
) to customize the behaviour, hence the Seq2SeqTrainer
.
Hope this answers your question.
9 Likes
hi @valhalla
Thanks a lot for your fast reply. I understand the needs. I am using my own methods to compute the metrics and they are different the common ones. So it would not be relevant for me as far as I understand
1 Like
Indeed. Also note that some of the specific features (like sortish sampling) will be integrated with Trainer at some point, so Seq2SeqTrainer
is mostly about predict_with_generate
.
6 Likes
or @sgugger @valhalla. Correct me if I a wrong, but one should be perfectly fine using Seq2SeqTrainer
to train decoder only models if they wish to compute custom metrics which require the generated token sequences? I have just given a read to the Seq2SeqTrainer implementation in 4.35.2 and I see the custom prediction_step
implementation discussed above. It seems the above should work with both types of transformers, with the only custom logic for encoder-decoder models being
# If the `decoder_input_ids` was created from `labels`, evict the former, so that the model can freely generate
# (otherwise, it would continue generating from the padded `decoder_input_ids`)
if (
"labels" in generation_inputs
and "decoder_input_ids" in generation_inputs
and generation_inputs["labels"].shape == generation_inputs["decoder_input_ids"].shape
):
generation_inputs = {
k: v for k, v in inputs.items() if k not in ("decoder_input_ids", "decoder_attention_mask")
}
I think I was a bit confused in thinking that Seq2SeqTrainer
is what you use to train “sequence-to-sequence” transformers (aka encoder-decoder architectures) but in fact it’s just a nifty subclass we can use to train both types of models if we wish to predict output sequences for computing sequence level metrics (eg BLEU and friends). Correct me if I’m wrong
1 Like