Transformers.onnx vs optimum.onnxruntime

Hello. I am interested in converting a model to ONNX to get faster inference, but I saw there are two possible approaches:

Should I convert the model to ONNX with the first and then use it with Optimum? It looks like Optimum can convert models to ONNX by its own now, so what is the point of transformers.onnx package?

Hi @Maxinho,

ORTModel APIs in Optimum manage the conversion of models from PyTorch to ONNX(we currently use the export in transformers.onnx) when it is needed, and implement the inference for different tasks so that you can use it just like using AutoModel APIs in Transformers.

In terms of acceleration, Optimum offers ORTOptimizer and ORTQuantizer, with which you can optimize your computation graph and quantize your ONNX model so that you can accelerate even more the inference.