Getting "No worker is available to serve request: model" with HuggingFaceModel endpoint

philschmid · March 4, 2022, 8:34am

by default the HuggingFace Inference DLC starts as many Workers as CPU cores you have. Meaning for m5n.xlarge instance you have 4 workers.

Regarding the error you see:

Are you using Multi-Model Endpoints?
What was the memory utilization?
How long does the request take? → It could be possible that all workers were blocked due to either long inference or a deadlock inside your code and didn’t finish so it wasn’t possible to receive new requests
could you try updating the latest image? Reference
“During feature extraction another endpoint is being called for generating text embeddings” → does this mean the endpoint which returned the 503 calls another endpoint? (i couldn’t find something in the script) If that’s true then point 3 might be the reason. Since you would block the worker until the inner requests is resolved and generation can take quite long.

P.S. feel free to share a proper architecture on what you do. Happy to potentially improve it and solve those bottlenecks with a more async approach.

Topic		Replies	Views
503 No worker is available when calling single huggingface endpoint Amazon SageMaker	11	4262	April 7, 2022
Serverless deploy troubles Amazon SageMaker	5	1444	May 16, 2022
Error hosting endpoint when deploying model in sagemaker Models	0	79	July 20, 2024
Getting error in the inference stage of Transformers Model (Hugging Face) 🤗Transformers	0	773	October 11, 2022
Error: Could Not Load Model Amazon SageMaker	7	6609	March 11, 2022