Stuck starting inference model

Seeing message [Server message]Load balancer not ready yet for a while now on Inference Endpoints - Hugging Face. Have tried to recreate the inference endpoint, change the instance size, but all without luck. Could I get some help please?

I’m experiencing the same issue hosting llama2. I’ve even tried switching out different versions of the llama2 model 7b and 13b.

These are the logs that get generated before it gets stuck.

2023/08/16 17:19:01 ~ INFO | Repository ID: meta-llama/Llama-2-7b-chat-hf
2023/08/16 17:19:01 ~ INFO | Repository Revision: 08751db2aca9bf2f7f80d2e516117a53d7450235
2023/08/16 17:19:01 ~ INFO | Start loading image artifacts from huggingface.co
2023/08/16 17:19:01 ~ INFO | Ignore regex pattern for files, which are not downloaded: tf*, flax*, rust*, *onnx, *safetensors, *mlmodel, *tflite, *tar.gz, *ckpt
2023/08/16 17:19:01 ~ INFO | Used configuration:
2023/08/16 17:20:27 ~ {"timestamp":"2023-08-17T00:20:27.894791Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"ackplain9886-aws-llama-2-7b-chat-hf-9863-67bf75fb89-6gn6n\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2023/08/16 17:20:27 ~ {"timestamp":"2023-08-17T00:20:27.894934Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2023/08/16 17:20:30 ~ {"timestamp":"2023-08-17T00:20:30.179786Z","level":"WARN","fields":{"message":"No safetensors weights found for model /repository at revision None. Converting PyTorch weights to safetensors.\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:36 ~ {"timestamp":"2023-08-17T00:21:36.274320Z","level":"INFO","fields":{"message":"Convert: [1/2] -- Took: 0:01:06.093642\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:41 ~ {"timestamp":"2023-08-17T00:21:41.052882Z","level":"INFO","fields":{"message":"Convert: [2/2] -- Took: 0:00:04.778280\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:41 ~ {"timestamp":"2023-08-17T00:21:41.653348Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2023/08/16 17:21:41 ~ {"timestamp":"2023-08-17T00:21:41.653577Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.039672Z","level":"INFO","fields":{"message":"Server started at unix:///tmp/text-generation-server-0\n"},"target":"text_generation_launcher"}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.060661Z","level":"INFO","fields":{"message":"Shard ready in 7.406491831s"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.159300Z","level":"INFO","fields":{"message":"Starting Webserver"},"target":"text_generation_launcher"}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.215664Z","level":"WARN","message":"no pipeline tag found for model /repository","target":"text_generation_router","filename":"router/src/main.rs","line_number":191}
2023/08/16 17:21:49 ~ {"timestamp":"2023-08-17T00:21:49.220354Z","level":"INFO","message":"Warming up model","target":"text_generation_router","filename":"router/src/main.rs","line_number":210}
2023/08/16 17:21:50 ~ {"timestamp":"2023-08-17T00:21:50.528107Z","level":"INFO","message":"Connected","target":"text_generation_router","filename":"router/src/main.rs","line_number":244}
2023/08/16 17:21:50 ~ {"timestamp":"2023-08-17T00:21:50.528113Z","level":"WARN","message":"Invalid hostname, defaulting to 0.0.0.0","target":"text_generation_router","filename":"router/src/main.rs","line_number":249}
2023/08/16 17:21:50 ~ {"timestamp":"2023-08-17T00:21:50.528075Z","level":"INFO","message":"Setting max batch total tokens to 15440","target":"text_generation_router","filename":"router/src/main.rs","line_number":243}```

same here @ncolina, i’m using gte-large and the model seems to start, but the load balancer never does

Hi @vadman97, @ncolina - Thanks for reporting and sorry for the wait! This load balancer issue should now be resolved. Please let us know though if you continue to encounter an issue when deploying the endpoint. Thanks again!

It’s been more than half an hour now that the load balancer is stuck…
Server message:Load balancer not ready yet

Hi @floripaolo, Thanks for letting us know. We’re taking a look and we’ll get back to you soon.

1 Like

Restarting solved the problem.

1 Like