Transformers / T5 , jit trace, script, quantize

Is it possible, when using TorchServe for inference, to improve the speed of inferencing T5 in specific (or transformers in general) by doing either:
And if possible, how?

When I simply try to save/export a pretrained model using:

traced_model = torch.jit.trace(model, (dummy_input_ids, dummy_attention_mask, dummy_decoder_input_ids)), "")

I get an error message.

Hey @ndvb ! Did you solve it?

@rahulbhalley, Nope