Model Deploy On-prem

kornkamola · March 14, 2024, 6:56pm

If I have access to a V100 GPU, how can I deploy the model onto it? Is there a library available that facilitates managing API calls to the model? Additionally, I’m curious if there are any tutorials on this topic. My plan involves invoking the model via an API, utilizing Flask or another framework, and returning the output generated by the LLM model. How can I effectively maintain this backend, ensuring optimal resource utilization?

Randima-Silva · March 21, 2024, 5:17pm

I have been trying to do the same thing for hours. I’m using Flask api. It is possible to invoke LLM via rest Api. but this flask api isnt working…

Topic		Replies	Views
Deploying LLM in Production: Performance Degradation with Multiple Users 🤗Transformers	6	4733	June 7, 2024
On Demand GPU model hosting? Beginners	3	948	June 2, 2025
Deploying inference model size and performance 🤗Transformers	6	5185	July 9, 2024
How to deploy model on custom server? Models	1	694	February 20, 2024
API Rest with several models loaded using GPU but not at same time Beginners	1	401	June 10, 2021

Model Deploy On-prem

Related topics