Optimize response time of model output

I am using Parrot Paraphrase that uses transformers model, but the problem is that the result generation is too slow for it without GPU. I have limitation of GPU and can only process the result on CPU. I saw some optimization solutions but they are all suggesting to train the model in some desired form or optmization but they model is already trained by someone else so I can not do much on that. Recently I improved the performance of one of the model by pytorch quantize implementation. eg:

model = (AutoModelForSequenceClassification

model_quantized = quantize_dynamic(model, {nn.Linear}, dtype=torch.qint8)

Can I use it with the parrot library, the code it uses for the model loading is:

  self.tokenizer = AutoTokenizer.from_pretrained(model_tag)
   self.model     = AutoModelForSeq2SeqLM.from_pretrained(model_tag)