BART from finetuned BERT

Hi all!

Is it possible to use a pretrained BERT model to initialize the encoder part of an encoder-decoder model like BART, leaving the decoder uninitialized (or random), and then do fintuning on some seq2seq task?

How should I proceed if its possible? Does someone know of previous instances where something like that has been tried?

Thanks in advance!

I am not entirely sure about Bart but you can check out this: transformers/ at master · huggingface/transformers · GitHub

Also you can read the publication linked in comments, I think this is similar to what you want to achieve

1 Like

yeah i eventually found that as well, it is indeed what I had in mind, plus the linked papers where super insightful.