Hi @Xenova , thank you for having a try at this!
I have the same experience as you for gpt2 (although I had to add position_ids
as inputs to have matching logits due to this logic) using only decoder_with_past_model.onnx
.
Unfortunately I did not have time to try with encoder-decoder models, I was assuming it was possible. I could have a look shortly.
Alternatively, are you able to support ONNX models that have subgraphs? That’s the approach we are currently taking in Optimum, for reference: Validating ONNX model fails for GPT-J · Issue #607 · huggingface/optimum · GitHub . This is currently available only for decoder-only models for now though, I plan to extend to encoder-decoder architectures.
If you export a decoder-only model you’ll see as output a merged decoder in charge of both cases without past/with past: optimum-cli export onnx gpt2 gpt2_onnx/