Pegasus Questions

I’m trying to exactly understand the differences among the pre-trained Pegasus models.

As far as I understood:

  • models like google/pegasus-* (e.g google/pegasus-xsum) are base models
  • all base models are fine-tuned on a dataset (e.g. xsum in the previous example)
  • google/pegasus-large is only pretrained (on C4 and Newsroom?)
  • sshleifer/distill-pegasus-* and sshleifer/student_pegasus-* are distilled models
  • google/bigbird-pegasus-large-* use the bigbird attention mechanism.

My questions are the following:

  1. Is my understanding correct?
  2. Is there any way to get a base Pegasus which is not fine-tuned on a downstream dataset?
  3. Is google/pegasus-multi_news multilingual?
  4. As for the distilled models, what is the difference between a distill-* and a student-* model? And what do the two numbers represent (e.g. in sshleifer/distill-pegasus-cnn-16-4)?

Thank you very much and thanks for all the work.