Has anyone deployed a seq2seq model by converting it to ONNX?
What I want to do:
- Convert T5 (preferably, but any seq2seq model should work) to ONNX
- Use the ONNX RT or Tensor RT runtime and host the ONNX model using Triton
- Run the model on the GPU
- Do batch inference
Any help on this would be appreciated