Pegasus Model Weights Compression/Pruning

hmm this is a bit odd :grimacing:

how did you do the quantization? i just remembered that someone already asked about quantizing pegasus here so maybe you can check whether you can dynamically quantize the model in a similar to how i described it there and then try generating some outputs with the same model in memory (i.e. don’t save and reload)

if that works, then my guess is that from_pretrained doesn’t support loading quantized models (i can have a look) and you might need to do the loading in native pytorch