I tried to warm-start an encoder-decoder model using different kinds of pretrained models (such as xlnet and roberta encoders with a gpt2 decoder) for the summarization task based on @patrickvonplaten 's great blog of using Bert2GPT2 for summarization. All of my experiments ended with very poor ROUGE results (close to zero in all Rouge scores). Then, I used exactly the same code in patrickvonplaten/bert2gpt2-cnn_dailymail-fp16 · Hugging Face using Colab Pro+ with only slight modifications to the training arguments and a batch_size of 4 instead of 16, and also got poor results (Rouge 2= 0.004) while it was (Rouge 2= 15.16) when I call the model with:
model = EncoderDecoderModel.from_pretrained(“patrickvonplaten/bert2gpt2-cnn_dailymail-fp16”)
Can you help me please to find out why did I get different and poor results after training the model for almost 14 hours while I’m using the same @patrickvonplaten 's code?
Thank you in advance
The training arguments used:
training_args = TrainingArguments(