Using onnx for text-generation with GPT-2

saied · November 1, 2021, 12:37pm

Hi @valhalla @patrickvonplaten , I was working with onnx_transformers and using onnx for GPT-2 model and text-generation task. I used transformer pipeline for text-generation and the runtime for generating text was a bit high(20~30s) and I’ve tried using different approaches like using cronjobs to handle it but it didn’t help. and I found your repo and think of using onnx to accelerate the text generation. As I read the README on the repo there is no text-generation for onnx_transformers. I also used some mehtods in this notebook: Inference_GPT2_with_OnnxRuntime_on_CPU but the qulity of generated text was not even near transformer pipline, would you please give me some insight about this runtime issue and how can I accelerate text-generation besides increasing resources.
Thanks

nielsr · November 1, 2021, 4:00pm

Hi,

We’ve recently added an example of exporting BART with ONNX, including beam search generation: https://github.com/huggingface/transformers/tree/master/examples/onnx/pytorch/translation

However, it doesn’t include a README right now, which could be very useful to explain how exactly the model can be used. I’ve asked the author to add it.

saied · November 2, 2021, 7:33am

Thnaks @nielsr

nielsr · July 25, 2022, 11:35am

The new URL is here: transformers/examples/research_projects/onnx/summarization at main · huggingface/transformers · GitHub

nielsr · February 3, 2023, 9:25am

Update here; text generation with ONNX models is now natively supported in HuggingFace Optimum. This library is meant for optimization/pruning/quantization of Transformer based models to run on all kinds of hardware.

For ONNX, the library implements several ONNX-counterpart classes of the classes available in Transformers. For instance, BertModel is called ORTModel in Optimum (ORT = ONNX Runtime).

Check the guide here: Overview

Topic		Replies	Views
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	413	September 17, 2021
Gpt2 inference with onnx and quantize Beginners	6	3879	February 3, 2021
Speech2Text transformer to ONNX conversion 🤗Transformers	0	698	June 20, 2021
:rocket: Optimum Transformers: accelerated NLP pipelines with Infinity speed 🤗Transformers	4	667	March 25, 2022
Optimize large scale transformer model inference with ONNX Runtime Models	0	387	January 18, 2022

Using onnx for text-generation with GPT-2

Related topics