Using onnx for text-generation with GPT-2

Hi @valhalla @patrickvonplaten , I was working with onnx_transformers and using onnx for GPT-2 model and text-generation task. I used transformer pipeline for text-generation and the runtime for generating text was a bit high(20~30s) and I’ve tried using different approaches like using cronjobs to handle it but it didn’t help. and I found your repo and think of using onnx to accelerate the text generation. As I read the README on the repo there is no text-generation for onnx_transformers. I also used some mehtods in this notebook: Inference_GPT2_with_OnnxRuntime_on_CPU but the qulity of generated text was not even near transformer pipline, would you please give me some insight about this runtime issue and how can I accelerate text-generation besides increasing resources.
We’ve recently added an example of exporting BART with ONNX, including beam search generation:

However, it doesn’t include a README right now, which could be very useful to explain how exactly the model can be used. I’ve asked the author to add it.

Thnaks @nielsr

The new URL is here: transformers/examples/research_projects/onnx/summarization at main · huggingface/transformers · GitHub

