I am trying to improve the performance of
sshleifer/distilbart-cnn-12-6 summarization model using Optimum. Recently tested with other classification models and just by converting to ONNX I was getting an inference performance gain, but seems not to be the case with seq2seq models.
I would like to know the process to optimize this kind of models, because the model is splitted into multiple parts and not supported by the regular optimizer.
You can have a look at this notebook gist where I was trying to optimize each of the ONNX files produced after saving while measuring the performance vs the standard summarization pipeline.