How to deploy model on custom server?

Hi! I have finetuned a wav2vec2 on custom data for ASR. How can i deploy it on my own GPU server? what are the possible way to make our own server because cloud is very costly and I cannot afford it. I want to deploy it on my own GPU and want to give my customer an API for using it. how can i scale it to the 1000 of user?
If I deploy the model on my own server, do I need to create 1000 instances of the same model for 1000 customers to use it simultaneously?


Usually people use Kubernetes in production, which scales Docker containers automatically based on the load.

This mean that you would first need to wrap your API in a Docker container. Wrapping an API is typically done using Flask or FastAPI.

Next, the Docker container could be automatically scaled using Kubernetes. Personally I don’t know whether it’s feasible to run Kubernetes locally, but I assume you can.