I am trying to deploy a fine-tuned t5 model in production. This is something new to me, to deploy a PyTorch model in production. I went through the presentation from Hugging Face on youtube, about how they deploy the model. And some of the other blog posts.
It is mentioned by HF that they deploy the model on Cython environment as it gives a ~100 times boost to the inference. So, is it always advisable to run a model in production on Cython?
Converting a model in Pytorch to TF does it help and is advisable or not?
What is the preferred container approach to adopt to run multiple models on a set of GPUs?
I know some of these questions would be basic, I apologize for it, but I want to make sure that I follow the correct guidelines to deploy a model in production.