Pegasus Questions

silvia-casola · June 15, 2021, 12:20pm

I’m trying to exactly understand the differences among the pre-trained Pegasus models.

As far as I understood:

models like google/pegasus-* (e.g google/pegasus-xsum) are base models
all base models are fine-tuned on a dataset (e.g. xsum in the previous example)
google/pegasus-large is only pretrained (on C4 and Newsroom?)
sshleifer/distill-pegasus-* and sshleifer/student_pegasus-* are distilled models
google/bigbird-pegasus-large-* use the bigbird attention mechanism.

My questions are the following:

Is my understanding correct?
Is there any way to get a base Pegasus which is not fine-tuned on a downstream dataset?
Is google/pegasus-multi_news multilingual?
As for the distilled models, what is the difference between a distill-* and a student-* model? And what do the two numbers represent (e.g. in sshleifer/distill-pegasus-cnn-16-4)?

Thank you very much and thanks for all the work.

Topic		Replies	Views
Questions about Pegasus for Summarization 🤗Transformers	1	787	August 24, 2020
Creating summaries of fixed length with PEGASUS model 🤗Transformers	1	477	July 13, 2022
Finetuning Pegasus for summarization task 🤗Transformers	3	1050	October 14, 2020
Output truncation of summaries models 🤗Transformers	0	442	March 30, 2023
Summarization on long documents 🤗Transformers	63	59169	August 16, 2024