Transformers / T5 , jit trace, script, quantize

Is it possible, when using TorchServe for inference, to improve the speed of inferencing T5 in specific (or transformers in general) by doing either:
jit.trace
jit.script
quantize
And if possible, how?

When I simply try to save/export a pretrained model using:

traced_model = torch.jit.trace(model, (dummy_input_ids, dummy_attention_mask, dummy_decoder_input_ids))
	torch.jit.save(traced_model, "t5_small_traced.pt")

I get an error message.

Hey @ndvb ! Did you solve it?

@rahulbhalley, Nope