BERT2BERT Notebook for Models without GenerationMixin

marcoabrate · November 12, 2020, 9:16am

Hello,

I have just stambled upon the “Warm-starting BERT2BERT for CNN/Dailymail” Notebook @patrickvonplaten kindly shared with us: https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb

From my understanding, one can use this notebook for fine-tuning any model ForConditionalGeneration, such as T5, Pegasus, ProphetNet, etc. Can you please confirm this?

Moreover, I am interested in the new Longformer and Reformer models, which you can feed much longer sequences. These two models do not have a ForConditionalGeneration class. However, I was wondering if they could be fine-tuned on summarization tasks using the same script, e.g. changing

from transformers import EncoderDecoderModel

bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained(“bert-base-uncased”, “bert-base-uncased”)

With

from transformers import LongformerModel, ReformerModel

longformer = LongformerModel.from_pretrained(“allenai/longformer-base-4096”)
reformer = ReformerModel.from_pretrained(“google/reformer-enwik8”)

Thank you for your help!

Topic		Replies	Views
Can we use a random state Bert model in BertGeneration? 🤗Transformers	0	411	June 14, 2023
Chapter 3 questions Course	143	10252	July 10, 2025
BertGeneration trains over 2x faster than T5ForConditionalGeneration. Am I doing something wrong? Beginners	1	334	August 19, 2022
Warm-started encoder-decoder models (Bert2Gpt2 and Bert2Bert) Beginners	11	2488	June 9, 2024
Control EncoderDecoderModel to generate tokens step by step 🤗Transformers	8	2594	June 8, 2022

BERT2BERT Notebook for Models without GenerationMixin

Related topics