Deploying my own custom Llama model to production using Hugging Face

Hello everyone,

Our university project involves deploying a custom fine-tuned Llama model for our institute’s chatbot. I’m new to Hugging Face and seeking guidance on deploying the model. We’ve prepared the Chat UI and the model itself. We want this chatbot to be used by everyone. So ofc we want to deploy the model using some cloud service to make inferences using API. Scalability is crucial; we need to handle multiple queries simultaneously and the model should accommodate large context lengths within the prompt. What Hugging Face service would be best considering both functionality and cost-effectiveness?