Export M2M100 model to ONNX

fxmarty · June 15, 2023, 5:52am

@ahmedbr Coud you fill a bug report with a reproduction script on Issues · huggingface/optimum · GitHub so that I can have a look at it?

@luckyt The main reason is because you normally want to run the encoder only once, while you’d like to loop over the decoder when generating. You could say, ok why not wrap everything into a single ONNX, with say an If node to decide whether or not to run the encoder? Something like this with subgraphs:

This could be doable actually. The issue with that is that usability is a bit harder, as the encoder and decoder do not have the same inputs/outputs. So you would need to create fake input/outputs, which theoretically works, but may lead into errors and be a bit unintuitive.

About generation, what is slightly challenging is that inputs/outputs are fixed with ONNX, and more importantly when exporting we use torch.jit.trace that can not handle controlflows, that are typically use to handle the without/with past (use KV cache or not) case. In the first step of the generation, you don’t use the KV cache, while in later steps you do. See transformers/src/transformers/models/t5/modeling_t5.py at v4.30.2 · huggingface/transformers · GitHub & How does the ONNX exporter work for GenerationModel with `past_key_value`?

Topic		Replies	Views
How can I use the ONNX model? 🤗Transformers	2	1604	January 29, 2024
Optimize large scale transformer model inference with ONNX Runtime Models	0	380	January 18, 2022
Error exporting T5 model to ONNX with optimum-cli 🤗Optimum	3	817	May 7, 2024
Transformers.onnx vs optimum.onnxruntime 🤗Optimum	1	1139	September 12, 2022
When exporting seq2seq models with ONNX, why do we need both decoder_with_past_model.onnx and decoder_model.onnx? 🤗Optimum	12	4594	March 7, 2024

Export M2M100 model to ONNX

Related topics