Export M2M100 model to ONNX

NNDam · May 9, 2022, 6:53am

I’ve port facebook/m2m100_418M to ONNX for translation task using this but when visualize by netron, it required 4 inputs: input_ids, attention_mask, decoder_input_ids, decoder_attention_mask and I don’t know how to inference with ONNX-runtime.

How can I solve this problem ?
Thanks in advance for your help.

double · May 16, 2022, 4:25pm

Did you find a solution?

omoekan · June 30, 2022, 4:50pm

I have the same issue. Have you found a solution yet?

Jour · July 7, 2022, 12:31pm

I tried to convert this model with onnx by adding this type of the task python3.8 -m transformers.onnx --model=facebook/m2m100_418M onnx/ --feature=seq2seq-lm-with-past, but in this case it says that it needs 54 inputs, otherwise I have the same problem. I know that the model needs the input and output language but I can’t really understand how to use the model with onnx. An example would be welcome

I also looked for indications in the commit of the model: M2M100 support for ONNX export by michaelbenayoun · Pull Request #15193 · huggingface/transformers · GitHub. I think it can be useful.

osanseviero · July 8, 2022, 9:29am

cc @lewtun

echoRG · August 22, 2022, 1:48am

Also having the same question, Could I have an example for this m2m-100 onnx model? It will be very helpful.

lewtun · August 22, 2022, 2:52pm

Hi folks, the best way to run inference with ONNX models is via the optimum library. This library allows you to inject ONNX models directly in the pipeline() function from transformers and thus skip all the annoying pre- and post-processing steps

Here’s a demo for M2M100 based on the docs:

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("facebook/m2m100_418M")
# `from_transformers` will export the model to ONNX on-the-fly 🤯
model = ORTModelForSeq2SeqLM.from_pretrained("facebook/m2m100_418M", from_transformers=True)
onnx_translation = pipeline("translation_en_to_de", model=model, tokenizer=tokenizer)

text = "My name is Lewis."
# returns [{'translation_text': 'Mein Name ist Lewis.'}]
pred = onnx_translation(text)

Hope that helps!

awaiskaleem · November 30, 2022, 1:37pm

Running into following error when I run code as is from @lewtun

AttributeError: type object 'FeaturesManager' has no attribute 'determine_framework'

Using following version:
torch → ‘1.10.0’
transformers → ‘4.20.1’

lewtun · November 30, 2022, 1:52pm

cc @fxmarty who might be able to take a look

awaiskaleem · November 30, 2022, 1:57pm

Thanks, also, not sure where is target language ‘de’ mentioned above in tokenizer/model. Greatly appreciate your help.

fxmarty · December 1, 2022, 4:30pm

Hi @awaiskaleem , transformers==4.20.1 is 5 months old. Could you try to update (pip install --upgrade transformers)? Current supported stable version is 4.20.0. The code snippet from @lewtun works well for me with transformers==4.20.0 and optimum==1.5.1.

Additionally @NNDam , @double @omoekan , @Jour , @echoRG , @awaiskaleem , I wanted to let you know that the ONNX export through transformers.onnx will likely soon rely on a soft dependency to optimum.exporters where all things export will be maintained. You can check the documentation here.

Now, specifically for M2M100, keep in mind that it is a seq2seq (translation) model! Hence, it uses both an encoder and decoder, as detailed in transformers doc. In transformers, the standard use is to model.generate(**inputs). However, by default the ONNX export can not handle the loop that there is in the decoder: transformers/utils.py at d51e7c7e8265d69db506828dce77eb4ef9b72157 · huggingface/transformers · GitHub . Hence, when exporting to ONNX in a single file, unless you do some manual surgery on the ONNX graph, the model will be hardly usable.

The solution that is currently explored & in use in Optimum’s ORTModelForSeq2SeqLM leveraging ONNX Runtime is to use two ONNX files: one for the encoder, and one for the decoder.

Using Optimum main (not yet in the stable release, but you can expect it next week), python -m optimum.exporters.onnx --model valhalla/m2m100_tiny_random --for-ort m2m100_tiny_onnx_ort, we obtain two models:

an encoder expecting the input_ids, attention_mask
a decoder expecting encoder_attention_mask, input_ids and encoder_hidden_states. This follows closely transformers decoder and generate.

So if you would like to use these exported ONNX models outside of Optimum, I simply recommend to use the above command to export and handle yourself the models then. But ORTModelForSeq2SeqLM is meant to save you the hassle.

If you want to try it right away, feel free to try the dev version: pip install -U git+https://github.com/huggingface/optimum.git@main

Edit 2022-12-27: Feel free to have a look at the latest release notes which includes the feature: Release v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures · huggingface/optimum · GitHub

ahmedbr · May 29, 2023, 7:56am

This partially worked for me. I mean I was able to load the infer the model successfully but the text wasn’t translated into “de” in your example. The result I got was as following:

  pred: [{'translation_text': 'de: My name is Lewis.'}]

The model I’m using is: “facebook/nllb-200-distilled-600M”

luckyt · June 14, 2023, 8:52am

Hi there! Thanks for the insights, I have a couple of questions about the architecture of the ORT seq2seq model that I hope you could clarify.

Firstly, I’m curious why Optimum requires the encoder and decoder to be loaded from two separate ONNX files, instead of a single ONNX file? I’m guessing (from a quick glance at the source code) that it’s because it utilizes two ORT inference sessions for the encoder and decoder instead of using a single session for the entire model – is there a rationale for this design?

Secondly, you mentioned that the ONNX export of the model faces difficulties with the generate loop in the decoder, while the transformers model seems to handle it fine. I’m wondering what specifically in the generate loop makes it challenging for the ORT model to handle?

fxmarty · June 15, 2023, 5:52am

@ahmedbr Coud you fill a bug report with a reproduction script on Issues · huggingface/optimum · GitHub so that I can have a look at it?

@luckyt The main reason is because you normally want to run the encoder only once, while you’d like to loop over the decoder when generating. You could say, ok why not wrap everything into a single ONNX, with say an If node to decide whether or not to run the encoder? Something like this with subgraphs:

This could be doable actually. The issue with that is that usability is a bit harder, as the encoder and decoder do not have the same inputs/outputs. So you would need to create fake input/outputs, which theoretically works, but may lead into errors and be a bit unintuitive.

About generation, what is slightly challenging is that inputs/outputs are fixed with ONNX, and more importantly when exporting we use torch.jit.trace that can not handle controlflows, that are typically use to handle the without/with past (use KV cache or not) case. In the first step of the generation, you don’t use the KV cache, while in later steps you do. See transformers/src/transformers/models/t5/modeling_t5.py at v4.30.2 · huggingface/transformers · GitHub & How does the ONNX exporter work for GenerationModel with `past_key_value`?

Topic		Replies	Views
Using ONNX format of the facebook/mbart-large-50-many-to-many-mmt? Intermediate	2	42	June 23, 2025
How to export facebook/mbart-large-50-many-to-many-mmt to ONNX format? Beginners	3	39	December 17, 2024
ValueError: Model requires 4 inputs. Input Feed contains 2 ONNX Beginners	4	3225	January 24, 2023
When exporting seq2seq models with ONNX, why do we need both decoder_with_past_model.onnx and decoder_model.onnx? 🤗Optimum	12	4574	March 7, 2024
Export to Onnx and run inference Bigbirdpegasus summariser Intermediate	0	213	January 24, 2023

Export M2M100 model to ONNX

Related topics