Hey guys, I’ve managed to create a quantize version of gpt2 using onnxruntime but i don’t seem to be able to run it for some reason. anyone has a tutorial for it? also how does the “generate” method of the model will work with that any ideas?

Gpt2 inference with onnx and quantize

lewtun February 3, 2021, 10:58am 6

FYI there’s a nice section in the docs that explains the various text generation strategies and how they’re implemented: Utilities for Generation — transformers 4.2.0 documentation

1 Like

Topic		Replies	Views
Using onnx for text-generation with GPT-2 🤗Transformers	4	4085	February 3, 2023
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	409	September 17, 2021
Regarding Quantizing gpt2-xl, gpt2-large, &c 🤗Optimum	2	1345	August 10, 2022
Using GPT-Neo-125M with ONNX Intermediate	3	1356	July 5, 2022
Support for exporting generate function to ONNX? 🤗Transformers	7	2314	February 8, 2023