Gpt2 inference with onnx and quantize

yanagar25 · February 1, 2021, 7:42am

Hey guys,
I’ve managed to create a quantize version of gpt2 using onnxruntime but i don’t seem to be able to run it for some reason. anyone has a tutorial for it? also how does the “generate” method of the model will work with that any ideas?

lewtun · February 1, 2021, 8:42am

Hi @yanagar25 when you say you cannot run the quantized version, what kind of error are you running into?

Here’s a notebook that explains how to export a pretrained model to the ONNX format: transformers/04-onnx-export.ipynb at master · huggingface/transformers · GitHub

You can also find more details here: Exporting transformers models — transformers 4.2.0 documentation

I don’t see an obvious reason why the generate method should not work after quantization, so as with most things in deep learning the best advice is to just try and see if it does

yanagar25 · February 2, 2021, 11:15am

Hey, @lewtun

I am sorry if i am asking something that have been answered before. but in order to run the quantized model i need to run it with onnxruntime.InferenceSession. How does that can be combined with using the generate method? from what i understand i need to copy the entire logic from the generate method and instead of using self(...) use session.run(None, ort_inputs). please correct me if i’m wrong

lewtun · February 3, 2021, 10:09am

Ah now I understand better what you’re trying to achieve. Indeed you might have to write your own generate method so that you can integrate the InferenceSession - there’s an example of doing text generation with GPT-2 in the ONNX repo here: onnxruntime/Inference_GPT2_with_OnnxRuntime_on_CPU.ipynb at master · microsoft/onnxruntime · GitHub

You could just adapt their approach to include the generation method you need (beam search, sampling etc)

yanagar25 · February 3, 2021, 10:43am

thank you so much for the reply!

lewtun · February 3, 2021, 10:58am

FYI there’s a nice section in the docs that explains the various text generation strategies and how they’re implemented: Utilities for Generation — transformers 4.2.0 documentation

yanagar25 · February 3, 2021, 11:29am

I will definitely look into it! thank you again

Topic		Replies	Views
Using onnx for text-generation with GPT-2 🤗Transformers	4	4077	February 3, 2023
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	408	September 17, 2021
Regarding Quantizing gpt2-xl, gpt2-large, &c 🤗Optimum	2	1345	August 10, 2022
Using GPT-Neo-125M with ONNX Intermediate	3	1351	July 5, 2022
Support for exporting generate function to ONNX? 🤗Transformers	7	2307	February 8, 2023

Gpt2 inference with onnx and quantize

Related topics