Hi I am new to transformers. I am using the some models of it for many tasks. One is the summarization using google pegasus-xum model, the performance is good on GPU but when I try to do it on CPU it takes around 16-18 seconds. I also started using the parrot-paraphrase library which uses the T5 model in the backend, it also performs the same in GPU but on CPU it is taking time around 5-8 seconds to process the result. Due to some limitation of GPU on my server I have to optimize it for CPU, to take down the response time to 2-4 seconds max.
Here are the link of models I am using:
Pegasus-XSUM: google/pegasus-xsum · Hugging Face
Parrot=Paraphrase: prithivida/parrot_paraphraser_on_T5 · Hugging Face
Code for pegasus model:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration
from trained_model import ModelFactory
import os
project_root = os.path.dirname(os.path.dirname(__file__))
path = os.path.join(project_root, 'models/')
class Summarization:
def __init__(self):
self.mod = ModelFactory("summary")
self.tokenizer = PegasusTokenizer.from_pretrained(path)
def generate_abstractive_summary(self, text):
""" To generate summary of text using pegasus model"""
model = self.mod.get_preferred_model("summarization")
inputs = self.tokenizer([text], max_length=1024, return_tensors='pt', truncation=True)
summary_ids = model.generate(inputs['input_ids'])
summary = [self.tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids]
return summary[0]
Is there any way to improve the performance, any small improvement also appreciated…