Gpt2 inference with onnx and quantize

yanagar25 · February 2, 2021, 11:15am

I am sorry if i am asking something that have been answered before. but in order to run the quantized model i need to run it with onnxruntime.InferenceSession. How does that can be combined with using the generate method? from what i understand i need to copy the entire logic from the generate method and instead of using self(...) use session.run(None, ort_inputs). please correct me if i’m wrong

Topic		Replies	Views
Using onnx for text-generation with GPT-2 🤗Transformers	4	4085	February 3, 2023
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	409	September 17, 2021
Regarding Quantizing gpt2-xl, gpt2-large, &c 🤗Optimum	2	1345	August 10, 2022
Using GPT-Neo-125M with ONNX Intermediate	3	1356	July 5, 2022
Support for exporting generate function to ONNX? 🤗Transformers	7	2314	February 8, 2023

Gpt2 inference with onnx and quantize

Related topics