Improve the performance of model prediction of transformers model

One can export the model to ONNX, apply quantization, etc.

This thread can help: Fast CPU Inference On Pegasus-Large Finetuned Model -- Currently Impossible? - #4 by the-pale-king