Hi all Thanks in advance for the help.
I’m trying to migrate from using a sentencetransformer model from huggingface on device to using an inference endpoint hosted by huggingface for feature extraction. I’ve setup a hosted inference service for the same model I was using locally.
What I’m doing now:
embeddings_model = SentenceTransformer("thenlper/gte-small")
embeddings_model.encode(docs, show_progress_bar=True)
What I’m trying to change to:
API_URL = "https://<redacted>.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
"Authorization": "Bearer <redacted>",
"Content-Type": "application/json"
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": docs,
})
Unfortunately, when I try to do this, I end up invariably hitting an error such as The size of tensor a ) must match the size of tensor b () at non-singleton dimension 1
. I’ve tried chunking the data down into smaller sections, but still hitting this issue. Any guidance would be appreciated.
When testing, docs
is a list of 4000 strings. The first example works as expected, while the second one encounters the error described.