Hi!
I am trying to improve the performance of sshleifer/distilbart-cnn-12-6
summarization model using Optimum. Recently tested with other classification models and just by converting to ONNX I was getting an inference performance gain, but seems not to be the case with seq2seq models.
I would like to know the process to optimize this kind of models, because the model is splitted into multiple parts and not supported by the regular optimizer.
You can have a look at this notebook gist where I was trying to optimize each of the ONNX files produced after saving while measuring the performance vs the standard summarization pipeline.
Thank you
1 Like
Hi @pablojs,
To optimize a seq2seq model, you should first export it to the ONNX format using ORTModelForSeq2SeqLM
and then apply optimization on each of its component (encoder
, decoder
and decoder_with_past
). We are currently working on the refactorization of the ORTOptimizer
in order to simplify its usage, you can follow the progress in #294. You might also be interested in applying dynamic quantization on your model with the ORTQuantizer
(refactorization in #270)
1 Like
Hi @Z3K3, let’s move this discussion to Optimize an ONNX Seq2Seq model as you are describing the same problem there. Please don’t cross-post the same question in multiple topics in the future as it makes things difficult to track.
1 Like