Gpt2 inference with onnx and quantize

Hey, @lewtun

I am sorry if i am asking something that have been answered before. but in order to run the quantized model i need to run it with onnxruntime.InferenceSession. How does that can be combined with using the generate method? from what i understand i need to copy the entire logic from the generate method and instead of using self(...) use session.run(None, ort_inputs). please correct me if i’m wrong