T5 inference performance

Hey :wave:,

Did you try quantization ?

There is an example for pegasus model here. I tried and it performed pretty well for summarization with an inference time decrease by 2x or 3x