Using onnx for text-generation with GPT-2

Hi @valhalla @patrickvonplaten , I was working with onnx_transformers and using onnx for GPT-2 model and text-generation task. I used transformer pipeline for text-generation and the runtime for generating text was a bit high(20~30s) and I’ve tried using different approaches like using cronjobs to handle it but it didn’t help. and I found your repo and think of using onnx to accelerate the text generation. As I read the README on the repo there is no text-generation for onnx_transformers. I also used some mehtods in this notebook: Inference_GPT2_with_OnnxRuntime_on_CPU but the qulity of generated text was not even near transformer pipline, would you please give me some insight about this runtime issue and how can I accelerate text-generation besides increasing resources.
Thanks :slightly_smiling_face:

1 Like


We’ve recently added an example of exporting BART with ONNX, including beam search generation: transformers/examples/onnx/pytorch/translation at master · huggingface/transformers · GitHub

However, it doesn’t include a README right now, which could be very useful to explain how exactly the model can be used. I’ve asked the author to add it.

1 Like

Thnaks @nielsr