What is 'Killed uvicorn webservice_starlette' Error?

Cartinoe5930 · September 21, 2023, 11:45am

Hello HF Forum! I was trying to deploy the GPTQ quantized ‘llama-2-13b-hf’ model by Inference Endpoints. However, every time that I tried to initialize my quantized model with my custom handler, I always met ‘entrypoint.sh: line 13: 28 Killed uvicorn webservice_starlette:app --host 0.0.0.0 --port 5000’ error message in the log. The full process to initialize the model is repeated when meeting that error message… What is the problem? And How can I solve this problem? Please let me know!

My model is here: Cartinoe5930/llama-2-13B-GPTQ

P.S. I used 1x Tesla T4 to deploy the model. This is because loading a quantized model does not need much GPU RAM, so I chose that GPU.

Topic		Replies	Views
What is the error for ./entrypoint.sh: line 21: 21 Killed uvicorn webservice_starlette:app --host 0.0.0.0 --port ${PORT}? Inference Endpoints on the Hub	3	89	November 15, 2024
Errors running Inference Endpoint with quantized model Inference Endpoints on the Hub	2	789	September 14, 2023
Fail to deploy newer models Inference Endpoints on the Hub	4	174	February 5, 2025
Can i create endpoint using quantized model? Inference Endpoints on the Hub	3	715	January 16, 2024
Inference Endpoint Failure - Error: "Shard 0 failed to start" Inference Endpoints on the Hub	1	784	November 21, 2024

What is 'Killed uvicorn webservice_starlette' Error?

Related topics