from transformers import AutoTokenizer, AutoModel
import torch
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return sum_embeddings / sum_mask
#Sentences we want sentence embeddings for
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
#Load AutoModel from huggingface model repository
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
#Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
hey @vitali i believe integration with sentence-transformers in the inference API is currently in progress, so maybe @osanseviero can share some details (or whether it’s currently possible)
Your question comes in good time. You can already do this by calling https://api-inference.huggingface.co/pipeline/feature-extraction/MODEL_ID. This endpoint is in experimental state at the moment, so things might not be stable.
Note that, as of now, we’re working on deeply integrating sentence-transformers with the Hub. This will be part of the v2 release of the library. Some details:
Allow downloading sentence-transformer models from the Hub (PR, merged).
Allow uploading sentence-transformer models from the Hub (PR).
The pipeline is usually inferred from the tags in the model repo. Forcing the pipeline through the API has a risk of misusing a model for a different pipeline, but there are also certain models that support multiple pipelines, so you can use that as well.
So If I add the tag “feature-extraction” in the model repo then call to inference API will produce embeddings? what will happen if I add multiple tags, i.e “feature-extraction”, “fill-mask”, “zero-shot-classification”?
Hi @vitali. We currently try to keep things simple: we usually have 1 task per model, and this holds for most models. But in case your model does support other tasks, you can use the API url as above.
Right now there’s no way to validate that the model will work with the task, so that’s why this is not shared widely - it might lead to misusing and getting incorrect results.
Thank you very much for your reply, I understand the issue: every transformer model can serve some tasks (make embeddings, MLM, zero-shot classification) but no model can serve all tasks, thus comes a risk of misuse. Perhaps it would make sense to add “capabilities.json” to the model repo to define the list of supported tasks/pipelines based on the model architecture? I think this would clear some confusion amongst users, just a thought. Anyway, this way solves my need to make embeddings for downstream tasks perfectly, thank you very much again.
Hey Omar, is this still an experimental API? I can’t seem to find any details about it. Would appreciate you pointing me to some resources if available as I am evaluating some APIs and would like to test yours. Thanks.
Hello Omar, I was trying to load embeddings via API as discussed in this thread, but I am struggling to find a model that actually supports this method.