Hi rpelissier
Sorry about the hassle here. I did a deep dive on issue and I think I know what’s going on: the model deployed in your inference endpoint uses the TEI server engine. Whereas the local example uses sentence-transformers. And unfortunately there’s a mismatch between the implementations. This model is one of the few that uses a Dense module, which is supported in sentence transformers but not in TEI.
So when the model is ran with TEI (and therefore on inference endpoints), it’s equivalent to doing this in sentence transformers:
from sentence_transformers import SentenceTransformer
import torch
sentences = ["This is an example sentence", "Each sentence is converted"]
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Running on {device}.")
model = SentenceTransformer("sentence-transformers/LaBSE").to(device)
embeddings = model.encode(sentences)
print("default", embeddings)
edited_model = SentenceTransformer("sentence-transformers/LaBSE").to(device)
del edited_model[2]
embeddings = edited_model.encode(sentences)
print("del model[2]:", embeddings)
this gives the output:
default [[ 0.02882483 -0.00602379 -0.05947006 ... -0.03002251 -0.029607
0.00067482]
[-0.05550232 0.02546485 -0.02157257 ... 0.02932104 0.0115004
-0.00848789]]
del model[2]: [[-0.00814162 0.01150823 -0.01516913 ... -0.02249936 0.02313923
-0.02578063]
[ 0.00584357 0.03796612 0.0039336 ... 0.03305857 0.03542801
0.0157448 ]]
where the former corresponds to the same results in the post above, and the latter should be similar to the model deployed on inference endpoints with TEI.
This is indeed not ideal and I’ve notified the maintainers of TEI so they can work on either supporting the Dense feature or alternatively clearly showing that this model isn’t supported in TEI.
As a potential solution, when you deploy this model on Inference Endpoints, you can select the “Default” container instead of the TEI one. The default container is a simple wrapper around the sentence transformers library, so it’s not as performant as TEI, but it should give you the correct embeddings.
Hopefully this helps