Where to run inference on a fine-tuned sentence transformer model

jsilver · August 20, 2023, 3:06pm

I’ve fine tuned a sentence transformer model (thenlper/gte-large) to generate embeddings. For now, I’ve used google colab to create the model. Now I need to generate real-time embeddings (i.e. not batch jobs) of new sentences. For now, I only want to generate embeddings for 10K - 20K sentences /day and the traffic is very bursty, but I need real-time inference. Where are the best options for running inference.

Thanks.

Topic		Replies	Views
How to get sentence embedding using a fine-tuned model Intermediate	0	260	April 18, 2023
Integration Issue with Finetuned Embedding Inference Endpoint Inference Endpoints on the Hub	0	45	November 18, 2024
Embedding evaluation Beginners	3	130	January 8, 2025
Return embeddings via inference api 🤗Transformers	0	369	January 17, 2023
Can one get embeddings from an inference API that computes Sentence Similarity (in 2023)? Inference Endpoints on the Hub	0	416	June 3, 2023

Where to run inference on a fine-tuned sentence transformer model

Related topics