Hi @valhalla @patrickvonplaten , I was working with onnx_transformers and using onnx for GPT-2 model and text-generation task. I used transformer pipeline for text-generation and the runtime for generating text was a bit high(20~30s) and I’ve tried using different approaches like using cronjobs to handle it but it didn’t help. and I found your repo and think of using onnx to accelerate the text generation. As I read the README on the repo there is no text-generation for onnx_transformers. I also used some mehtods in this notebook: Inference_GPT2_with_OnnxRuntime_on_CPU but the qulity of generated text was not even near transformer pipline, would you please give me some insight about this runtime issue and how can I accelerate text-generation besides increasing resources.
Thanks
Hi,
We’ve recently added an example of exporting BART with ONNX, including beam search generation: https://github.com/huggingface/transformers/tree/master/examples/onnx/pytorch/translation
However, it doesn’t include a README right now, which could be very useful to explain how exactly the model can be used. I’ve asked the author to add it.
Thnaks @nielsr
The new URL is here: transformers/examples/research_projects/onnx/summarization at main · huggingface/transformers · GitHub
Update here; text generation with ONNX models is now natively supported in HuggingFace Optimum. This library is meant for optimization/pruning/quantization of Transformer based models to run on all kinds of hardware.
For ONNX, the library implements several ONNX-counterpart classes of the classes available in Transformers. For instance, BertModel is called ORTModel
in Optimum (ORT = ONNX Runtime).
Check the guide here: Overview