Where to run inference on a fine-tuned sentence transformer model

I’ve fine tuned a sentence transformer model (thenlper/gte-large) to generate embeddings. For now, I’ve used google colab to create the model. Now I need to generate real-time embeddings (i.e. not batch jobs) of new sentences. For now, I only want to generate embeddings for 10K - 20K sentences /day and the traffic is very bursty, but I need real-time inference. Where are the best options for running inference.