Integration Issue with Finetuned Embedding Inference Endpoint

We recently created inference endpoint, which we use to run a finetuned embedding model on your GPU service. While the endpoint itself functions as intended, the request format it provides is difficult to integrate with our existing codebase. We followed the guide provided here: https://huggingface.co/docs/inference-endpoints/guides/test_endpoint .

For a more seamless integration, we would like to use a method similar to the SentenceTransformer class from sentence_transformers import SentenceTransformer ( i.e., embed_model = SentenceTransformer( model_name=model) ), as it would allow for a more compatible fit with our current implementation.

We attempted to utilize the TextEmbeddingsInference class from llama_index.embeddings.text_embeddings_inference (embed_model = TextEmbeddingsInference( model_name=model, base_url=API_URL, timeout=60, auth_token=key )`)

However, when we call embed_model.get_text_embedding("Test"), we encounter an error.

Would you be able to provide guidance on this issue or suggest any corrections to our approach? Any assistance in resolving this would be greatly appreciated, as using SentenceTransformer or TextEmbeddingsInference directly would substantially streamline our integration process.

Thank you for your time and support.

1 Like