Deploying Seq2Seq using ONNX on GPU

Has anyone deployed a seq2seq model by converting it to ONNX?

What I want to do:

  • Convert T5 (preferably, but any seq2seq model should work) to ONNX
  • Use the ONNX RT or Tensor RT runtime and host the ONNX model using Triton
  • Run the model on the GPU
  • Do batch inference

Any help on this would be appreciated

1 Like