from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)
Just a few questions if someone has a few moments:
how are these embeddings different from contextual embeddings that I would get with distilbert and other transformer models?
More importantly, once I have the embeddings I can simply compute a cosine similarity metrics with other sentences to cluster by similarity. If so, what is the need of an API as described here Sentence Transformers in the Hugging Face Hub. Am I missing something more subtle here?
in general, this approach gives higher-quality embeddings than those you’d get from distilbert etc and you can find a nice performance chart here
regarding your second question, i’m not sure which api you’re referring to exactly in the blog post (which is mostly about the integration of sentence-transformers with the hugging face hub). but indeed, once you have the embeddings you can compute metrics / cluster using whatever tools you wish
hi @lewtun thanks for this useful reference. I will look at it shortly. As for the API, I am referring to this part
import json
import requests
API_URL = "https://api-inference.huggingface.co/models/sentence-transformers/paraphrase-MiniLM-L6-v2"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
data = query(
{
"inputs": {
"source_sentence": "That is a happy person",
"sentences": [
"That is a happy dog",
"That is a very happy person",
"Today is a sunny day"
]
}
}
)
If similarity is simply computed with a dot product (cosine similarity) why do we need to call the API? I think I might be missing something obvious here…
hi @olaffson, the inference api is useful if your model needs to fit inside some larger application and you don’t want to worry about all the infrastructure concerns around scaling / deployment etc.
having said that, if you’re just tinkering with embeddings or don’t need to deploy the model, then it’s probably simpler to just load the model on your machine and compute the embeddings directly