Encoder-decoder transformers,

Yeah · April 30, 2021, 8:09am

i want to use Roberta as encoder and GPT ass decoder for a generation task, except the decoder loss, i also want to create a classification task in encoder, and sum these two loss to train the model. But from the original code, I found the encoder includes the pooling layer, while it doesn’t work and computing doesn’t pass the pooling layer, is it correct?

if i first build a encoder-decoder architecture, and then use the first output ([CLS]) of encoder to go through a pooling and softmax layer, is it correct?

anyone has suggestions? Thanks a lot!

Topic		Replies	Views
[EncoderDecoder] Parameter sharing 🤗Transformers	1	1005	September 24, 2020
How to use optimum with encoder-decoder models 🤗Optimum	1	1302	October 16, 2022
Encoder Decoder Model gives same generation results after finetuning 🤗Transformers	2	657	August 4, 2022
Possible encoder decoder models Beginners	0	192	June 11, 2021
Using decoder only part of pretrained MarianMT (Encoder-Decoder Translation model) 🤗Transformers	2	530	October 18, 2023

Encoder-decoder transformers,

Related topics