How can I use the ONNX model?

I performed some tasks and my model was saved at ./oyot5_small_unyo_dcyo_mix. Later, I converted it to an onnx model using this script

optimum-cli export onnx --model oyot5_small_unyo_dcyo_mix/ --task text2text-generation oyto_t5_small_onnx/

which was saved as ./oyto_t5_small_onnx. See the picture below for the output of the conversion. How can I use these onnx models? I understand why I have encoder_model.onnx and decoder_model.onnx. I’m using a pre-trained t5 model which has an encoder_decoder architecture. My pain point here is I cannot use these converted onnx models.

What resources can help me?

Screenshot 2024-01-28 at 14.39.13


You can run inference with the ORTModelForSeq2SeqLM class (ORT is short for ONNX runtime). Just pass your folder to the from_pretrained method. Docs: Models

See for instance the T5 ONNX model here: optimum/t5-small · Hugging Face.

1 Like

Thank you @nielsr , I extensively read the documentation you shared and came up with these codes.

from transformers import AutoTokenizer, pipeline, PretrainedConfig
from optimum.onnxruntime import ORTModelForSeq2SeqLM
import onnxruntime

# Load encoder model
encoder_session = onnxruntime.InferenceSession('oyto_t5_small_onnx/encoder_model.onnx')

# Load decoder model
decoder_session = onnxruntime.InferenceSession('oyto_t5_small_onnx/decoder_model.onnx')

model_id = "oyto_t5_small_onnx/"
tokenizer = AutoTokenizer.from_pretrained(model_id)

config = PretrainedConfig.from_json_file('oyto_t5_small_onnx/config.json')

model = ORTModelForSeq2SeqLM(

onnx_translation = pipeline("translation_src_to_target", model=model, tokenizer=tokenizer)

text = 'the text to perform your translation task'
result = onnx_translation(text, max_length = 10000)
1 Like