Transformers / T5 , jit trace, script, quantize

ndvb · February 9, 2023, 8:09am

Is it possible, when using TorchServe for inference, to improve the speed of inferencing T5 in specific (or transformers in general) by doing either:
jit.trace
jit.script
quantize
And if possible, how?

When I simply try to save/export a pretrained model using:

traced_model = torch.jit.trace(model, (dummy_input_ids, dummy_attention_mask, dummy_decoder_input_ids))
	torch.jit.save(traced_model, "t5_small_traced.pt")

I get an error message.

rahulbhalley · April 18, 2023, 6:55pm

Hey @ndvb ! Did you solve it?

ndvb · April 18, 2023, 8:02pm

@rahulbhalley, Nope

Topic		Replies	Views
Torch JIT Training 🤗Transformers	0	1166	March 7, 2022
Need some help converting this Pytorch model to TorchScript (.pt) for deployment! 🙏 Beginners	1	3234	June 29, 2023
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5617	June 8, 2023
Huggingface Saving `VisionEncoderDecoderModel` to `TorchScript` problem 🤗Transformers	0	652	March 22, 2023
Can we convert dynamic DNN model to TorchScript? 🤗Transformers	0	482	June 14, 2023

Transformers / T5 , jit trace, script, quantize

Related topics