Hi all!
Is it possible to use a pretrained BERT model to initialize the encoder part of an encoder-decoder model like BART, leaving the decoder uninitialized (or random), and then do fintuning on some seq2seq task?
How should I proceed if its possible? Does someone know of previous instances where something like that has been tried?
Thanks in advance!
Best,
Gabriel.
I am not entirely sure about Bart but you can check out this: transformers/modeling_encoder_decoder.py at master · huggingface/transformers · GitHub
Also you can read the publication linked in comments, I think this is similar to what you want to achieve
1 Like
Hi!
yeah i eventually found that as well, it is indeed what I had in mind, plus the linked papers where super insightful.