BERT2BERT Notebook for Models without GenerationMixin

Hello,

I have just stambled upon the “Warm-starting BERT2BERT for CNN/Dailymail” Notebook @patrickvonplaten kindly shared with us: https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb

From my understanding, one can use this notebook for fine-tuning any model ForConditionalGeneration, such as T5, Pegasus, ProphetNet, etc. Can you please confirm this?

Moreover, I am interested in the new Longformer and Reformer models, which you can feed much longer sequences. These two models do not have a ForConditionalGeneration class. However, I was wondering if they could be fine-tuned on summarization tasks using the same script, e.g. changing

from transformers import EncoderDecoderModel

bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained(“bert-base-uncased”, “bert-base-uncased”)

With

from transformers import LongformerModel, ReformerModel

longformer = LongformerModel.from_pretrained(“allenai/longformer-base-4096”)
reformer = ReformerModel.from_pretrained(“google/reformer-enwik8”)

Thank you for your help! :slight_smile: