How to deploy a fine tuned t5 model in production

as-stevens · December 21, 2020, 8:56pm

Hi All,

I am trying to deploy a fine-tuned t5 model in production. This is something new to me, to deploy a PyTorch model in production. I went through the presentation from Hugging Face on youtube, about how they deploy the model. And some of the other blog posts.

It is mentioned by HF that they deploy the model on Cython environment as it gives a ~100 times boost to the inference. So, is it always advisable to run a model in production on Cython?
Converting a model in Pytorch to TF does it help and is advisable or not?
What is the preferred container approach to adopt to run multiple models on a set of GPUs?

I know some of these questions would be basic, I apologize for it, but I want to make sure that I follow the correct guidelines to deploy a model in production.

Thank you
Amit

Narsil · December 22, 2020, 8:45am

Hi @as-stevens,

I don’t know what blog post you’re referring to for using Cython to get 100x but I guess it really depends where the bottleneck is.
For t5 models, they are Seq2Seq models, and I would recommend to stick to PyTorch and finding a way to optimize the hot path (decoder path). TF could work, but transformers currently can’t use various graph optimizations in TF (we’re working on it).

Or you can try to run it on our hosted inference API to alleviate the hassle of managing all the different layers: https://huggingface.co/pricing (Some optimizations are only enabled for customers)

Hope that helps.
Cheers,
Nicolas

Topic		Replies	Views
How to deploy a T5 model to AWS SageMaker for fast inference? Amazon SageMaker	13	5777	February 28, 2022
Deploying 🤗 ViT on Vertex AI Intermediate	1	889	September 25, 2023
T5-Base not Torchscriptable Models	3	1541	June 25, 2023
Fine-Tuning / Pre-Training Tips 🤗Transformers	1	2947	August 5, 2022
Fine-tuning using TF and py Beginners	0	28	November 9, 2024

How to deploy a fine tuned t5 model in production

Related topics