Hi I am using couple of model from transformers, they work good on GPU but the performance on CPU is not that great.
I am using the Google Pegasus-Xsum model for summarization and it takes around 15 seconds to process result. Also I am using the parrot paraphrase library that also uses the T5 model in the backend, it also very slow on CPU takes around 5-7 seconds to generate the result.
Here is the link of both: google/pegasus-xsum · Hugging Face
Parrot Paraphrase: prithivida/parrot_paraphraser_on_T5 · Hugging Face
Any tips and suggestion to speed up the prediction in CPU, as there is limitation on my server currently…