Loading a large model - endpoint gets killed by ping health check

troopssteerio · February 20, 2022, 7:56pm

Hi,

I am trying to use a large model (25 gigabytes). Every time I start up an endpoint, it will get killed by Sagemaker for not passing the ping health check.

The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.

The log shows that it’s in the process of downloading the model, it usually is around 80% of getting the largest file when Sagemaker decides that it’s an unhealthy endpoint. The time this takes is too long for the health checker.

I downloaded the image and ran it on my local docker and I could verify that it only starts responding to /ping once the model has been loaded.

What are my options?

Is it possible to disable the health check on Sagemaker?
Is it possible to configure the timeout of the health check on Sagemaker?
Do I need to create my own image with a hacked version of sagemaker_huggingface_inference_toolkit that runs some rudimentary HTTP server while the model is loading, then figure out how to run that instead of the huggingface-pytorch-inference image?

I’d be thankful if someone could share their experience with this.

HDBro · June 19, 2022, 1:51pm

I’m having the same issue. Any solutions?

Gokuljs · August 1, 2023, 3:04pm

same here UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-inference-2023-08-01-14-29-42-558: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch lo

ipattis · August 31, 2023, 9:57am

Same here.

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-08-31-09-39-06-613: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint…

Cloudwatch: RuntimeError: weight lm_head.weight does not exist

@philschmid penny for your thoughts!

shizhe1 · September 9, 2023, 10:25pm

(post deleted by author)

Topic		Replies	Views
I am stuck with only g4dn.12xlarge Amazon SageMaker	1	585	January 2, 2024
Error hosting endpoint when deploying model in sagemaker Models	0	83	July 20, 2024
Error deploying endpoint on Aws Models	6	180	August 23, 2024
CPU/Memory Utilization Too High When Running Inference on Falcon 40B Instruct Amazon SageMaker	4	1565	August 31, 2023
Failed. Reason: Please make sure all images included in the model for the production variant AllTraffic exist, and that the execution role used to create the model has permissions to access them Amazon SageMaker	17	4424	November 27, 2023

Loading a large model - endpoint gets killed by ping health check

Related topics