Hello, I’m using the EncoderDecoderModel to do the summarization task.
I have questions on the loss computation in Trainer class.
For text summarization task, as far as I know, the encoder input is the content, the decoder input and the label is the summary.
The EncoderDecoderModel utilizes CausalLMModel as the Decoder model. In the CausalLMModel, the loss is computed by shifting the labels and inputs so that the decoder can predict the next token based on the decoder inputs.
However, in Trainer class, the labels is first poped out from inputs dictionary (transformers/trainer.py at master · huggingface/transformers · GitHub). Without labels, the loss will not be calculated in the decoder model (transformers/modeling_bert.py at master · huggingface/transformers · GitHub). The loss is calculated in Trainer Line 1887. This calculation is different from the calculation in the decoder model forward. There is no shift in labels and decoder inputs.
My question is how to define decoder inputs and labels in EncoderDecoderModel for text summarization task? How to use Trainer to fine-tune EncoderDecoderModel for text summarization task?
Thank you.