Gpt2 inference with onnx and quantize

lewtun · February 3, 2021, 10:09am

Ah now I understand better what you’re trying to achieve. Indeed you might have to write your own generate method so that you can integrate the InferenceSession - there’s an example of doing text generation with GPT-2 in the ONNX repo here: onnxruntime/Inference_GPT2_with_OnnxRuntime_on_CPU.ipynb at master · microsoft/onnxruntime · GitHub

You could just adapt their approach to include the generation method you need (beam search, sampling etc)

Topic		Replies	Views
Using onnx for text-generation with GPT-2 🤗Transformers	4	4085	February 3, 2023
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	409	September 17, 2021
Regarding Quantizing gpt2-xl, gpt2-large, &c 🤗Optimum	2	1345	August 10, 2022
Using GPT-Neo-125M with ONNX Intermediate	3	1356	July 5, 2022
Support for exporting generate function to ONNX? 🤗Transformers	7	2314	February 8, 2023