I’m trying to exactly understand the differences among the pre-trained Pegasus models.
As far as I understood:
- models like google/pegasus-* (e.g google/pegasus-xsum) are base models
- all base models are fine-tuned on a dataset (e.g. xsum in the previous example)
- google/pegasus-large is only pretrained (on C4 and Newsroom?)
- sshleifer/distill-pegasus-* and sshleifer/student_pegasus-* are distilled models
- google/bigbird-pegasus-large-* use the bigbird attention mechanism.
My questions are the following:
- Is my understanding correct?
- Is there any way to get a base Pegasus which is not fine-tuned on a downstream dataset?
- Is google/pegasus-multi_news multilingual?
- As for the distilled models, what is the difference between a distill-* and a student-* model? And what do the two numbers represent (e.g. in sshleifer/distill-pegasus-cnn-16-4)?
Thank you very much and thanks for all the work.