How to run inference for T5 tensorrt model deployed on nvidia triton?

rupeshpoojary97 · April 19, 2022, 12:17pm

I have deployed T5 tensorrt model on nvidia triton server and below is the config.pbtxt file, but facing problem while inferencing the model using triton client.

As per the config.pbtxt file there should be 4 inputs to the tensorrt model along with the decoder ids. But how can we send decoder as input to the model I think decoder is to be generated from models output.

Is there any way to inference using triton client.

name: "tensorrt_model"
platform: "tensorrt_plan"
max_batch_size: 0
input [
 {
    name: "input_ids"
    data_type: TYPE_INT32
    dims: [ -1, -1  ]
  },

{
    name: "attention_mask"
    data_type: TYPE_INT32
    dims: [-1, -1 ]
},

{
    name: "decoder_input_ids"
    data_type: TYPE_INT32
    dims: [ -1, -1]
},

{
   name: "decoder_attention_mask"
   data_type: TYPE_INT32
   dims: [ -1, -1 ]
}

]
output [
{
    name: "last_hidden_state"
    data_type: TYPE_FP32
    dims: [ -1, -1, 768 ]
  },

{
    name: "input.151"
    data_type: TYPE_FP32
    dims: [ -1, -1, -1 ]
  }

]

instance_group [
    {
        count: 1
        kind: KIND_GPU
    }
]

Topic		Replies	Views
T5 inference performance Models	5	1564	March 8, 2022
T5 Inference using tensorflow_model_server (with grpc) 🤗Transformers	0	411	January 29, 2023
Deploying Seq2Seq using ONNX on GPU Intermediate	0	745	March 24, 2022
T5 Model Generate and Model Outputs Vastly Different Beginners	1	816	September 11, 2022
Option to load only tokenizer and model configuration into "token-classification" pipeline 🤗Tokenizers	0	781	November 25, 2022

How to run inference for T5 tensorrt model deployed on nvidia triton?

Related topics