Stuck starting inference model

vadman97 · August 16, 2023, 10:08pm

Seeing message [Server message]Load balancer not ready yet for a while now on Inference Endpoints - Hugging Face. Have tried to recreate the inference endpoint, change the instance size, but all without luck. Could I get some help please?

ncolina · August 17, 2023, 12:24am

I’m experiencing the same issue hosting llama2. I’ve even tried switching out different versions of the llama2 model 7b and 13b.

These are the logs that get generated before it gets stuck.

2023/08/16 17:19:01 ~ INFO | Repository ID: meta-llama/Llama-2-7b-chat-hf
2023/08/16 17:19:01 ~ INFO | Repository Revision: 08751db2aca9bf2f7f80d2e516117a53d7450235
2023/08/16 17:19:01 ~ INFO | Start loading image artifacts from huggingface.co
2023/08/16 17:19:01 ~ INFO | Ignore regex pattern for files, which are not downloaded: tf*, flax*, rust*, *onnx, *safetensors, *mlmodel, *tflite, *tar.gz, *ckpt
2023/08/16 17:19:01 ~ INFO | Used configuration:
2023/08/16 17:20:27 ~ {"timestamp":"2023-08-17T00:20:27.894791Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"ackplain9886-aws-llama-2-7b-chat-hf-9863-67bf75fb89-6gn6n\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2023/08/16 17:20:27 ~ {"timestamp":"2023-08-17T00:20:27.894934Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2023/08/16 17:20:30 ~ {"timestamp":"2023-08-17T00:20:30.179786Z","level":"WARN","fields":{"message":"No safetensors weights found for model /repository at revision None. Converting PyTorch weights to safetensors.\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:36 ~ {"timestamp":"2023-08-17T00:21:36.274320Z","level":"INFO","fields":{"message":"Convert: [1/2] -- Took: 0:01:06.093642\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:41 ~ {"timestamp":"2023-08-17T00:21:41.052882Z","level":"INFO","fields":{"message":"Convert: [2/2] -- Took: 0:00:04.778280\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:41 ~ {"timestamp":"2023-08-17T00:21:41.653348Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2023/08/16 17:21:41 ~ {"timestamp":"2023-08-17T00:21:41.653577Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.039672Z","level":"INFO","fields":{"message":"Server started at unix:///tmp/text-generation-server-0\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.060661Z","level":"INFO","fields":{"message":"Shard ready in 7.406491831s"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.159300Z","level":"INFO","fields":{"message":"Starting Webserver"},"target":"text_generation_launcher"}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.215664Z","level":"WARN","message":"no pipeline tag found for model /repository","target":"text_generation_router","filename":"router/src/main.rs","line_number":191}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.220354Z","level":"INFO","message":"Warming up model","target":"text_generation_router","filename":"router/src/main.rs","line_number":210}
2023/08/16 17:21:50 ~ {"timestamp":"2023-08-17T00:21:50.528107Z","level":"INFO","message":"Connected","target":"text_generation_router","filename":"router/src/main.rs","line_number":244}
2023/08/16 17:21:50 ~ {"timestamp":"2023-08-17T00:21:50.528113Z","level":"WARN","message":"Invalid hostname, defaulting to 0.0.0.0","target":"text_generation_router","filename":"router/src/main.rs","line_number":249}
2023/08/16 17:21:50 ~ {"timestamp":"2023-08-17T00:21:50.528075Z","level":"INFO","message":"Setting max batch total tokens to 15440","target":"text_generation_router","filename":"router/src/main.rs","line_number":243}```

vadman97 · August 17, 2023, 12:40am

same here @ncolina, i’m using gte-large and the model seems to start, but the load balancer never does

michellehbn · August 18, 2023, 12:13pm

Hi @vadman97, @ncolina - Thanks for reporting and sorry for the wait! This load balancer issue should now be resolved. Please let us know though if you continue to encounter an issue when deploying the endpoint. Thanks again!

floripaolo · November 23, 2023, 9:28am

It’s been more than half an hour now that the load balancer is stuck…
Server message:Load balancer not ready yet

michellehbn · November 23, 2023, 3:24pm

Hi @floripaolo, Thanks for letting us know. We’re taking a look and we’ll get back to you soon.

floripaolo · November 23, 2023, 4:55pm

Restarting solved the problem.

Topic		Replies	Views
Stuck Inference endpoint Inference Endpoints on the Hub	1	368	December 14, 2023
My inference endpoint is on "Load balancer not ready yet" status for more than 1h, what is happeing? Inference Endpoints on the Hub	0	809	February 26, 2023
LLAMA2 70b Inference api stuck on currently loading Inference Endpoints on the Hub	4	1044	September 3, 2024
Inference API stopped working for my model 🤗Hub	11	5423	April 26, 2023
Llama 2 Inference Endpoint Stop Working Inference Endpoints on the Hub	2	361	June 25, 2024

Stuck starting inference model

Related topics