Ah now I understand better what you’re trying to achieve. Indeed you might have to write your own generate
method so that you can integrate the InferenceSession
- there’s an example of doing text generation with GPT-2 in the ONNX repo here: onnxruntime/Inference_GPT2_with_OnnxRuntime_on_CPU.ipynb at master · microsoft/onnxruntime · GitHub
You could just adapt their approach to include the generation method you need (beam search, sampling etc)