How to optimize ONNX seq2seq model?

Hi!

I am trying to improve the performance of sshleifer/distilbart-cnn-12-6 summarization model using Optimum. Recently tested with other classification models and just by converting to ONNX I was getting an inference performance gain, but seems not to be the case with seq2seq models.

I would like to know the process to optimize this kind of models, because the model is splitted into multiple parts and not supported by the regular optimizer.

You can have a look at this notebook gist where I was trying to optimize each of the ONNX files produced after saving while measuring the performance vs the standard summarization pipeline.

Thank you

Hi @pablojs,

To optimize a seq2seq model, you should first export it to the ONNX format using ORTModelForSeq2SeqLM and then apply optimization on each of its component (encoder , decoder and decoder_with_past). We are currently working on the refactorization of the ORTOptimizer in order to simplify its usage, you can follow the progress in #294. You might also be interested in applying dynamic quantization on your model with the ORTQuantizer (refactorization in #270)