Hi! I have finetuned a wav2vec2 on custom data for ASR. How can i deploy it on my own GPU server? what are the possible way to make our own server because cloud is very costly and I cannot afford it. I want to deploy it on my own GPU and want to give my customer an API for using it. how can i scale it to the 1000 of user?
If I deploy the model on my own server, do I need to create 1000 instances of the same model for 1000 customers to use it simultaneously?