ERROR: The size of tensor a () must match the size of tensor b () at non-singleton dimension 1

Hi all :wave:t4: Thanks in advance for the help.

I’m trying to migrate from using a sentencetransformer model from huggingface on device to using an inference endpoint hosted by huggingface for feature extraction. I’ve setup a hosted inference service for the same model I was using locally.

What I’m doing now:

embeddings_model = SentenceTransformer("thenlper/gte-small")
embeddings_model.encode(docs, show_progress_bar=True)

What I’m trying to change to:


API_URL = "https://<redacted>.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
	"Authorization": "Bearer <redacted>",
    "Content-Type": "application/json"
}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()
	
output = query({
	"inputs": docs,
})

Unfortunately, when I try to do this, I end up invariably hitting an error such as The size of tensor a ) must match the size of tensor b () at non-singleton dimension 1. I’ve tried chunking the data down into smaller sections, but still hitting this issue. Any guidance would be appreciated.

When testing, docs is a list of 4000 strings. The first example works as expected, while the second one encounters the error described.