Hey ,
Did you try quantization ?
There is an example for pegasus model here. I tried and it performed pretty well for summarization with an inference time decrease by 2x or 3x
Hey ,
Did you try quantization ?
There is an example for pegasus model here. I tried and it performed pretty well for summarization with an inference time decrease by 2x or 3x