I have already asked a similar question here but I guess this is actually the correct section to ask .
I need to use a pre-trained Pegasus which is not fine-tuned on downstream datasets. It seems that this model only exists in its large configuration, am I correct? Is there any way to get an equivalent base model?
As for the distilled models, what is the difference between a distill-* and a student-* model? And what do the two numbers represent (e.g. in sshleifer/distill-pegasus-cnn-16-4)?