I was wondering if huggingface provided any support for exporting the generate function from the transformers
library to ONNX?
Mainly, I was trying to create an ONNX model using a GPT2 style transformer in order to speed up inference when generating replies to a conversation.
I see there’s some support for exporting a single call to GPT2, but not the entire for loop used in greedy decoding/beam search/nucleus sampling etc.