How to optimize ONNX seq2seq model?

pablojs · July 26, 2022, 11:53am

Hi!

I am trying to improve the performance of sshleifer/distilbart-cnn-12-6 summarization model using Optimum. Recently tested with other classification models and just by converting to ONNX I was getting an inference performance gain, but seems not to be the case with seq2seq models.

I would like to know the process to optimize this kind of models, because the model is splitted into multiple parts and not supported by the regular optimizer.

You can have a look at this notebook gist where I was trying to optimize each of the ONNX files produced after saving while measuring the performance vs the standard summarization pipeline.

Thank you

echarlaix · July 27, 2022, 8:12am

Hi @pablojs,

To optimize a seq2seq model, you should first export it to the ONNX format using ORTModelForSeq2SeqLM and then apply optimization on each of its component (encoder , decoder and decoder_with_past). We are currently working on the refactorization of the ORTOptimizer in order to simplify its usage, you can follow the progress in #294. You might also be interested in applying dynamic quantization on your model with the ORTQuantizer (refactorization in #270)

echarlaix · August 25, 2022, 8:24am

Hi @Z3K3, let’s move this discussion to Optimize an ONNX Seq2Seq model as you are describing the same problem there. Please don’t cross-post the same question in multiple topics in the future as it makes things difficult to track.

Topic		Replies	Views
Optimize an ONNX Seq2Seq model 🤗Optimum	3	1929	November 17, 2022
Error while optimizing seq2seq model using optimum 🤗Optimum	1	62	September 16, 2024
ONNX only faster at lower sequence lengths 🤗Optimum	2	325	May 21, 2024
When exporting seq2seq models with ONNX, why do we need both decoder_with_past_model.onnx and decoder_model.onnx? 🤗Optimum	12	4590	March 7, 2024
Quantize and Optimize summarization model (Seq2SeqLM) Beginners	0	350	August 12, 2022

How to optimize ONNX seq2seq model?

Related topics