Endpoint Deployment Failed

Exit code: 1. Reason: s/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n  @custom_bwd\n/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n  @custom_fwd\n/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n  @custom_bwd\n/opt/conda/lib/python3.11/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.\n  return func(*args, **kwargs)"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2024-12-08T02:46:24.695372Z","level":"ERROR","fields":{"message":"Shard 0 crashed"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.695409Z","level":"INFO","fields":{"message":"Terminating webserver"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.695438Z","level":"INFO","fields":{"message":"Waiting for webserver to gracefully shutdown"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.695556Z","level":"INFO","message":"signal received, starting graceful shutdown","target":"text_generation_router::server","filename":"router/src/server.rs","line_number":2485}
{"timestamp":"2024-12-08T02:46:24.995806Z","level":"INFO","fields":{"message":"webserver terminated"},"target":"text_generation_launcher"}
{"timestamp":"2024-12-08T02:46:24.995837Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardFailed

My first deployment yesterday worked, I set it to scale zero after 15 minutes, but when I accessed it again today I initialized it and failed, then I went and tried Google’s deployment again and again it failed. I am deploying a llama 3.1 after fine tuning it

1 Like

Hi @expresscompany thanks for posting. We recently updated the version of TGI. When deploying the Endpoint for your model, under ‘Container Configuration,’ you can set the Container Type to ‘Custom’ and the Container URI to ‘ghcr.io/huggingface/text-generation-inference:3.0.0’. I’m attaching a screenshot of what this looks like just in case. Once you update the TGI version, you should be able to deploy the endpoint successfully.

1 Like