Integration Issue with Finetuned Embedding Inference Endpoint

Kouskousi · November 18, 2024, 9:49am

We recently created inference endpoint, which we use to run a finetuned embedding model on your GPU service. While the endpoint itself functions as intended, the request format it provides is difficult to integrate with our existing codebase. We followed the guide provided here: https://huggingface.co/docs/inference-endpoints/guides/test_endpoint .

For a more seamless integration, we would like to use a method similar to the SentenceTransformer class from sentence_transformers import SentenceTransformer ( i.e., embed_model = SentenceTransformer( model_name=model) ), as it would allow for a more compatible fit with our current implementation.

We attempted to utilize the TextEmbeddingsInference class from llama_index.embeddings.text_embeddings_inference (embed_model = TextEmbeddingsInference( model_name=model, base_url=API_URL, timeout=60, auth_token=key )`)

However, when we call embed_model.get_text_embedding("Test"), we encounter an error.

Would you be able to provide guidance on this issue or suggest any corrections to our approach? Any assistance in resolving this would be greatly appreciated, as using SentenceTransformer or TextEmbeddingsInference directly would substantially streamline our integration process.

Thank you for your time and support.

Topic		Replies	Views
Embedding endpoint returning [None] embeddings Inference Endpoints on the Hub	3	157	March 12, 2025
Inference endpoint Intermediate	1	33	August 11, 2024
Inference Endpoints for text embeddings inference not working Inference Endpoints on the Hub	2	212	August 16, 2024
Calling Inference API for text embedding Inference Endpoints on the Hub	1	1871	August 4, 2023
Error while trying to host finetuned model on inference endpoint Inference Endpoints on the Hub	2	412	May 22, 2024

Integration Issue with Finetuned Embedding Inference Endpoint

Related topics