Optimize an ONNX Seq2Seq model

I am trying to optimize a Seq2Seq model for summarization, this guide has been very useful for quantizing it, but the results for the quantized model aren’t accurate, so i want to optimize the model and compare the results. I want to know how i can optimize a Seq2seq model, I know that i will have to optimize each the encoder, decoder, and decoder_with_past, but i don’t know how, this is what i have so far:

This is the code i’m trying for optimizing each part, but isn’t working:

import re
import torch
from transformers import AutoConfig, AutoModelForSeq2SeqLM, AutoTokenizer
import time
from optimum.onnxruntime import ORTQuantizer, ORTModelForSeq2SeqLM, ORTOptimizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig, OptimizationConfig
from pathlib import Path


# load Seq2Seq model and set model file directory
model_id = "facebook/bart-large-cnn"
optimization_config = OptimizationConfig(optimization_level=99) 
onnx_path = '/'

# Create encoder optimizer
encoder_optimizer = ORTOptimizer.from_pretrained(model_name_or_path="/testing/encoder_model.onnx", feature='seq2seq-lm')
encoder_optimizer.export(optimization_config=optimization_config,
                        onnx_model_path=onnx_path / "encoder.onnx",
                        onnx_optimized_model_output_path=onnx_path / "encoder_optimized.onnx")

# Create decoder optimizer
decoder_optimizer = ORTOptimizer.from_pretrained(model_name_or_path="/testing/decoder_model.onnx", feature='seq2seq-lm')
decoder_optimizer.export(optimization_config=optimization_config,
                        onnx_model_path=onnx_path / "decoder.onnx",
                        onnx_optimized_model_output_path=onnx_path / "decoder_optimized.onnx")

# Create decoder with past key values optimizer
decoder_wp_optimizer = ORTOptimizer.from_pretrained(model_name_or_path="/testing/decoder_with_past_model.onnx", feature='seq2seq-lm')
decoder_wp_optimizer.export(optimization_config=optimization_config,
                        onnx_model_path=onnx_path / "decoder_wp.onnx",
                        onnx_optimized_model_output_path=onnx_path / "decoder_wp_optimized.onnx")


OUTPUT:
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Any help would be greatly appreciated.

Hi @Z3K3,

We are currently working on the refactorization of the ORTOptimizer in order to simplify its usage, you can follow the progress in #294. You can find an example on how to apply optimization on a Seq2Seq model in the associated documentation.

2 Likes