Why does the BGE large v1.5 return more than 1028 vectors from Sagemaker endpoint?

I deployed a sentence embedding LLM to AWS Sagemaker using their prebuilt ECR Images. These images have pytorch and transformers preloaded. They are set up to take a HuggingFace model name, a hugging face task and then it creates an API endpoint for inference.

The problem is that the endpoint works but returns about 300,000 vectors instead of the expected 1028 vectors! I should probably be doing an aggregation on the final tensor, but i don’t know how to do that with these prebuilt images from hugging face.

I’m using this model: BAAI/bge-large-en-v1.5 · Hugging Face

i’m using this terraform module to set up the endpoint, but under the hood, it’s using these prebuilt images that retrieves the model and gets everything set up for the inference endpoint.

How can i add information or environment variables so that the terraform API condenses the shape of the tensor?

Here are the additional links to the prebuilt ECR images: