Why does the BGE large v1.5 return more than 1028 vectors from Sagemaker endpoint?

mechanistcodes · February 29, 2024, 2:21am

I deployed a sentence embedding LLM to AWS Sagemaker using their prebuilt ECR Images. These images have pytorch and transformers preloaded. They are set up to take a HuggingFace model name, a hugging face task and then it creates an API endpoint for inference.

The problem is that the endpoint works but returns about 300,000 vectors instead of the expected 1028 vectors! I should probably be doing an aggregation on the final tensor, but i don’t know how to do that with these prebuilt images from hugging face.

I’m using this model: BAAI/bge-large-en-v1.5 · Hugging Face

i’m using this terraform module to set up the endpoint, but under the hood, it’s using these prebuilt images that retrieves the model and gets everything set up for the inference endpoint.

How can i add information or environment variables so that the terraform API condenses the shape of the tensor?

mechanistcodes · February 29, 2024, 2:22am

Here are the additional links to the prebuilt ECR images:

Topic		Replies	Views
Calling Image Classification Model Deployed in SageMaker Endpoint Amazon SageMaker	20	4139	January 3, 2025
Different results with model hosted in HuggingFace and hosted in SageMaker Models	1	591	November 15, 2023
Training model file too large and fail to deploy Amazon SageMaker	3	1377	July 3, 2023
The expanded size of the tensor (22528) must match the existing size (1024) at non-singleton dimension 0 Models	0	1487	July 25, 2023
Finetuning sentence embedding model with SageMaker - how to compute loss? Amazon SageMaker	9	3948	December 21, 2022

Why does the BGE large v1.5 return more than 1028 vectors from Sagemaker endpoint?

The problem is that the endpoint works but returns about 300,000 vectors instead of the expected 1028 vectors! I should probably be doing an aggregation on the final tensor, but i don’t know how to do that with these prebuilt images from hugging face.

Related topics