Model Deploy On-prem

If I have access to a V100 GPU, how can I deploy the model onto it? Is there a library available that facilitates managing API calls to the model? Additionally, I’m curious if there are any tutorials on this topic. My plan involves invoking the model via an API, utilizing Flask or another framework, and returning the output generated by the LLM model. How can I effectively maintain this backend, ensuring optimal resource utilization?

I have been trying to do the same thing for hours. I’m using Flask api. It is possible to invoke LLM via rest Api. but this flask api isnt working… :smiling_face_with_tear: