Endpoint Deployment Failed

expresscompany · December 8, 2024, 2:52am

Exit code: 1. Reason: s/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n  @custom_bwd\n/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n  @custom_fwd\n/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n  @custom_bwd\n/opt/conda/lib/python3.11/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.\n  return func(*args, **kwargs)"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2024-12-08T02:46:24.695372Z","level":"ERROR","fields":{"message":"Shard 0 crashed"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.695409Z","level":"INFO","fields":{"message":"Terminating webserver"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.695438Z","level":"INFO","fields":{"message":"Waiting for webserver to gracefully shutdown"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.695556Z","level":"INFO","message":"signal received, starting graceful shutdown","target":"text_generation_router::server","filename":"router/src/server.rs","line_number":2485}
{"timestamp":"2024-12-08T02:46:24.995806Z","level":"INFO","fields":{"message":"webserver terminated"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.995837Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardFailed

My first deployment yesterday worked, I set it to scale zero after 15 minutes, but when I accessed it again today I initialized it and failed, then I went and tried Google’s deployment again and again it failed. I am deploying a llama 3.1 after fine tuning it

meganariley · December 10, 2024, 4:55pm

Hi @expresscompany thanks for posting. We recently updated the version of TGI. When deploying the Endpoint for your model, under ‘Container Configuration,’ you can set the Container Type to ‘Custom’ and the Container URI to ‘ghcr.io/huggingface/text-generation-inference:3.0.0’. I’m attaching a screenshot of what this looks like just in case. Once you update the TGI version, you should be able to deploy the endpoint successfully.

Topic		Replies	Views
Error 400 - when I update endpoints to lastest version Inference Endpoints on the Hub	3	54	April 20, 2025
Key Error when trying to deploy inference endpoint Inference Endpoints on the Hub	2	787	December 3, 2023
Dedicated Endpoints error problem Models	1	228	April 12, 2024
Fail to deploy newer models Inference Endpoints on the Hub	4	203	February 5, 2025
Error when trying to run IP-Adapter-Face-ID using inference endpoints Inference Endpoints on the Hub	0	403	February 11, 2024

Endpoint Deployment Failed

Related topics