Optimize an ONNX Seq2Seq model

Z3K3 · August 18, 2022, 8:49pm

I am trying to optimize a Seq2Seq model for summarization, this guide has been very useful for quantizing it, but the results for the quantized model aren’t accurate, so i want to optimize the model and compare the results. I want to know how i can optimize a Seq2seq model, I know that i will have to optimize each the encoder, decoder, and decoder_with_past, but i don’t know how, this is what i have so far:

This is the code i’m trying for optimizing each part, but isn’t working:

import re
import torch
from transformers import AutoConfig, AutoModelForSeq2SeqLM, AutoTokenizer
import time
from optimum.onnxruntime import ORTQuantizer, ORTModelForSeq2SeqLM, ORTOptimizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig, OptimizationConfig
from pathlib import Path


# load Seq2Seq model and set model file directory
model_id = "facebook/bart-large-cnn"
optimization_config = OptimizationConfig(optimization_level=99) 
onnx_path = '/'

# Create encoder optimizer
encoder_optimizer = ORTOptimizer.from_pretrained(model_name_or_path="/testing/encoder_model.onnx", feature='seq2seq-lm')
encoder_optimizer.export(optimization_config=optimization_config,
                        onnx_model_path=onnx_path / "encoder.onnx",
                        onnx_optimized_model_output_path=onnx_path / "encoder_optimized.onnx")

# Create decoder optimizer
decoder_optimizer = ORTOptimizer.from_pretrained(model_name_or_path="/testing/decoder_model.onnx", feature='seq2seq-lm')
decoder_optimizer.export(optimization_config=optimization_config,
                        onnx_model_path=onnx_path / "decoder.onnx",
                        onnx_optimized_model_output_path=onnx_path / "decoder_optimized.onnx")

# Create decoder with past key values optimizer
decoder_wp_optimizer = ORTOptimizer.from_pretrained(model_name_or_path="/testing/decoder_with_past_model.onnx", feature='seq2seq-lm')
decoder_wp_optimizer.export(optimization_config=optimization_config,
                        onnx_model_path=onnx_path / "decoder_wp.onnx",
                        onnx_optimized_model_output_path=onnx_path / "decoder_wp_optimized.onnx")


OUTPUT:
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Any help would be greatly appreciated.

echarlaix · August 23, 2022, 4:07pm

Hi @Z3K3,

We are currently working on the refactorization of the ORTOptimizer in order to simplify its usage, you can follow the progress in #294. You can find an example on how to apply optimization on a Seq2Seq model in the associated documentation.

mineshj1291 · November 17, 2022, 12:09pm

Hi, I am trying to export decoder with past model and could not find any good doc. The documentation link is not working for me. can you direct me to updated page for the same. thanks.

echarlaix · November 17, 2022, 2:33pm

Hi @mineshj1291,

Check out our documentation for more information and examples for the ONNX export of Seq2Seq models as well as their optimization with our ORTOptimizer.

Topic		Replies	Views
How to optimize ONNX seq2seq model? 🤗Optimum	2	2131	August 25, 2022
Error while optimizing seq2seq model using optimum 🤗Optimum	1	60	September 16, 2024
When exporting seq2seq models with ONNX, why do we need both decoder_with_past_model.onnx and decoder_model.onnx? 🤗Optimum	12	4571	March 7, 2024
Quantize and Optimize summarization model (Seq2SeqLM) Beginners	0	350	August 12, 2022
Optimize AND quantize with Optimum 🤗Optimum	11	3288	February 10, 2024

Optimize an ONNX Seq2Seq model

Related topics