Using Accelerated Inference API to produce sentense embeddings

vitali · May 18, 2021, 4:39am

Is it possible to use Accelerated Inference API to produce sentense embeddings as described here?

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return sum_embeddings / sum_mask



#Sentences we want sentence embeddings for
sentences = ['This framework generates embeddings for each input sentence',
             'Sentences are passed as a list of string.',
             'The quick brown fox jumps over the lazy dog.']

#Load AutoModel from huggingface model repository
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")

#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')

#Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

lewtun · May 18, 2021, 1:08pm

hey @vitali i believe integration with sentence-transformers in the inference API is currently in progress, so maybe @osanseviero can share some details (or whether it’s currently possible)

osanseviero · May 18, 2021, 2:23pm

Hi @vitali!

Your question comes in good time. You can already do this by calling
https://api-inference.huggingface.co/pipeline/feature-extraction/MODEL_ID. This endpoint is in experimental state at the moment, so things might not be stable.

Note that, as of now, we’re working on deeply integrating sentence-transformers with the Hub. This will be part of the v2 release of the library. Some details:

Allow downloading sentence-transformer models from the Hub (PR, merged).
Allow uploading sentence-transformer models from the Hub (PR).

We expect to have more exciting results very soon

vitali · May 18, 2021, 3:08pm

Awesome! I will try to it.

vitali · May 24, 2021, 10:38pm

Can I infer from your answer that any pipeline can be inferred like this? That would be totally awesome.

osanseviero · May 25, 2021, 7:19am

The pipeline is usually inferred from the tags in the model repo. Forcing the pipeline through the API has a risk of misusing a model for a different pipeline, but there are also certain models that support multiple pipelines, so you can use that as well.

vitali · May 25, 2021, 2:06pm

So If I add the tag “feature-extraction” in the model repo then call to inference API will produce embeddings? what will happen if I add multiple tags, i.e “feature-extraction”, “fill-mask”, “zero-shot-classification”?

osanseviero · May 27, 2021, 9:57am

Hi @vitali. We currently try to keep things simple: we usually have 1 task per model, and this holds for most models. But in case your model does support other tasks, you can use the API url as above.

Right now there’s no way to validate that the model will work with the task, so that’s why this is not shared widely - it might lead to misusing and getting incorrect results.

vitali · May 27, 2021, 4:02pm

Thank you very much for your reply, I understand the issue: every transformer model can serve some tasks (make embeddings, MLM, zero-shot classification) but no model can serve all tasks, thus comes a risk of misuse. Perhaps it would make sense to add “capabilities.json” to the model repo to define the list of supported tasks/pipelines based on the model architecture? I think this would clear some confusion amongst users, just a thought. Anyway, this way solves my need to make embeddings for downstream tasks perfectly, thank you very much again.

vitali · May 28, 2021, 3:01pm

For community reference, the issue of defining and using model pipelines is also discussed on github.

abol3z · December 7, 2021, 10:01am

Hey Omar, is this still an experimental API? I can’t seem to find any details about it. Would appreciate you pointing me to some resources if available as I am evaluating some APIs and would like to test yours. Thanks.

osanseviero · December 8, 2021, 10:32am

Hey @abol3z.

The API is not in experimental anymore, but we’re working in its documentation. You can use https://api-inference.huggingface.co/pipeline/feature-extraction/MODEL_ID to obtain the sentence embeddings.

Let us know if you have any questions!

adbo28 · April 23, 2022, 3:03pm

Hello Omar, I was trying to load embeddings via API as discussed in this thread, but I am struggling to find a model that actually supports this method.

So for example, the model https://api-inference.huggingface.co/models/sentence-transformers/all-MiniLM-L6-v2 supports feature extraction, but the following URL is invalid:
https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2

Can you help?
Or maybe share a sample code snippet?

Thanks.

osanseviero · April 23, 2022, 6:54pm

Hi there! Here is a working end-to-end example for the model you suggested.

import requests

API_URL = " https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"
headers = {"Authorization": "Bearer TOKEN"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()
	
output = query({
	"inputs": "I like you. I love you",
})

aslasdlkj · September 22, 2022, 9:53am

Thanks sm! this helped a lot.

mtomov · April 12, 2023, 6:10pm

Hello,

I’ve used the feature extraction pipeline successfully with sentance-trasformer models.

How do I use it with a model, which requires mean_pooling to be applied to the result, such as E5?

To clarify: get a single array of feature embeddings vs the current result that comes out - 3 arrays for the word “test”.

Thank you!

radames · April 12, 2023, 7:25pm

hi @mtomov, you might need a custom pipeline to process the result, here is a duplicated model implementing the mean_pooling on the request radames/e5-large · Hugging Face, you can also try

import requests

API_URL = "https://api-inference.huggingface.co/models/radames/e5-large"
headers = {"Authorization": "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}


def query(payload):
    response = requests.post(API_URL, headers={}, json=payload)
    return response.json()


embeddings = query({
    "inputs": "query: how much protein should a female eat",
})
print(embeddings)

Topic		Replies	Views
Can one get an embeddings from an inference API that computes Sentence Similarity? Beginners	9	5357	March 13, 2025
Return embeddings via inference api 🤗Transformers	0	371	January 17, 2023
Can one get embeddings from an inference API that computes Sentence Similarity (in 2023)? Inference Endpoints on the Hub	0	418	June 3, 2023
Easiest way to get a senetence embedder from a transformers model? 🤗Transformers	1	1376	April 7, 2022
Extracting token embeddings from pretrained language models Beginners	9	22161	May 2, 2024

Using Accelerated Inference API to produce sentense embeddings

Related topics