Inference result not aligned with local version of same model and revision

erikkaum · June 25, 2025, 1:34pm

Hi rpelissier

Sorry about the hassle here. I did a deep dive on issue and I think I know what’s going on: the model deployed in your inference endpoint uses the TEI server engine. Whereas the local example uses sentence-transformers. And unfortunately there’s a mismatch between the implementations. This model is one of the few that uses a Dense module, which is supported in sentence transformers but not in TEI.

So when the model is ran with TEI (and therefore on inference endpoints), it’s equivalent to doing this in sentence transformers:

from sentence_transformers import SentenceTransformer
import torch
sentences = ["This is an example sentence", "Each sentence is converted"]
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Running on {device}.")

model = SentenceTransformer("sentence-transformers/LaBSE").to(device)
embeddings = model.encode(sentences)
print("default", embeddings)

edited_model = SentenceTransformer("sentence-transformers/LaBSE").to(device)
del edited_model[2]
embeddings = edited_model.encode(sentences)
print("del model[2]:", embeddings)

this gives the output:

default [[ 0.02882483 -0.00602379 -0.05947006 ... -0.03002251 -0.029607
   0.00067482]
 [-0.05550232  0.02546485 -0.02157257 ...  0.02932104  0.0115004
  -0.00848789]]
del model[2]: [[-0.00814162  0.01150823 -0.01516913 ... -0.02249936  0.02313923
  -0.02578063]
 [ 0.00584357  0.03796612  0.0039336  ...  0.03305857  0.03542801
   0.0157448 ]]

where the former corresponds to the same results in the post above, and the latter should be similar to the model deployed on inference endpoints with TEI.

This is indeed not ideal and I’ve notified the maintainers of TEI so they can work on either supporting the Dense feature or alternatively clearly showing that this model isn’t supported in TEI.

As a potential solution, when you deploy this model on Inference Endpoints, you can select the “Default” container instead of the TEI one. The default container is a simple wrapper around the sentence transformers library, so it’s not as performant as TEI, but it should give you the correct embeddings.

Hopefully this helps

Topic		Replies	Views
Embedding endpoint returning [None] embeddings Inference Endpoints on the Hub	3	167	March 12, 2025
SentenceSimilarityInputsCheck expected dict not list: `__root__` in `parameters` Beginners	7	1886	August 11, 2023
Can one get embeddings from an inference API that computes Sentence Similarity (in 2023)? Inference Endpoints on the Hub	0	419	June 3, 2023
Integration Issue with Finetuned Embedding Inference Endpoint Inference Endpoints on the Hub	0	46	November 18, 2024
Calling Inference API for text embedding Inference Endpoints on the Hub	1	1875	August 4, 2023

Inference result not aligned with local version of same model and revision

Related topics