Hey, @lewtun
I am sorry if i am asking something that have been answered before. but in order to run the quantized model i need to run it with onnxruntime.InferenceSession
. How does that can be combined with using the generate
method? from what i understand i need to copy the entire logic from the generate
method and instead of using self(...)
use session.run(None, ort_inputs)
. please correct me if i’m wrong