How does the ONNX exporter work for GenerationModel with `past_key_value`?

Hi @fxmarty

It does work for a small model!

But when exporting a larger one, I got a CUDA OOM error. Would you provide some insights? Your suggestions have always been helpful!

I opened a new thread here: CUDA OOM when export a large model to ONNX - :hugs:Optimum - Hugging Face Forums