Inference Endpoint Fails to Start

alexsafayan · December 14, 2023, 10:42am

Whenever I start to start an Inference Endpoint, it fails.

Setup

aws-dolphin-2-5-mixtral-8x7b-934

AWS

us-east-1

GPU · Nvidia Tesla T4 · 4x GPU · 64 GB

Log:

66d6896fchcwbw 2023-12-14T10:25:02.055Z  INFO | Repository ID: ehartford/dolphin-2.5-mixtral-8x7b
66d6896fchcwbw 2023-12-14T10:25:02.055Z  INFO | Repository Revision: cbf205ee8b82ec81cec217b124e5d57804c3139a

66d6896fchcwbw 2023-12-14T10:26:53.761Z Error: DownloadError
66d6896fchcwbw 2023-12-14T10:26:53.761Z {"timestamp":"2023-12-14T10:26:53.761152Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n... huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'... ValueError: Can't find 'adapter_config.json' at '/repository'"}} [...]

Moose4400 · December 14, 2023, 2:01pm

Running into this error as well. Contacted support.

alexsafayan · December 20, 2023, 11:38pm

Did support fix this issue?

Moose4400 · December 21, 2023, 4:07pm

Yes. Everything is working for me now.

calvinball · December 23, 2023, 10:52am

Hi, how did you fix this? What steps did support told you? I’m also facing the same error currently…

Moose4400 · December 23, 2023, 4:42pm

They didn’t provide me with any steps. It was an internal error that they fixed.
You should contact support with your details.

meganariley · January 3, 2024, 4:34pm

Hi @alexsafayan! Thanks for reporting this issue. We’re currently investigating this issue and I’ll update you as soon as I have more information or there’s a fix.

jostelo · January 3, 2024, 4:44pm

Hello @meganariley. Same problem here for the same VM but different models. E.g. LeoLM/leo-hessianai-7b-chat .

Thanks in advance!

meganariley · January 3, 2024, 4:46pm

Thanks @jostelo! I’ll keep you updated as well.

GowthamYarlagadda · January 3, 2024, 7:26pm

Hello @meganariley. Could you resolve my issue too? While trying to create an Inference Endpoint, I get this error: “Server message: Endpoint failed to start. Endpoint failed”

[nonprof/llama2-7b-finetuned-counsellor-full-model] (nonprof/llama2-7b-finetuned-counsellor-full-model · Hugging Face).

In the logs:
HFValidationError(\n\nhuggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or ‘-’, '', ‘.’, ‘–’ and ‘…’ are forbidden, ‘-’ and ‘.’ cannot start or end the name, max length is 96: ‘/repository’.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File "/opt/conda/bin/text-generation-server", line 8, in \n sys.exit(app())\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File "/opt/conda/lib/python3.10/site-packages/peft/utils/config.py", line 121, in from_pretrained\n raise ValueError(f"Can’t find ‘{CONFIG_NAME}’ at ‘{pretrained_model_name_or_path}’")\n\nValueError: Can’t find ‘adapter_config.json’ at ‘/repository’\n\n"},“target”:“text_generation_launcher”,“span”:{“name”:“download”},“spans”:[{“name”:“download”}]}

Thanks in advance! You could be my lifesaver!

IBLLMTST · January 4, 2024, 3:17am

I am having the same (or at least very similar) problem. I am trying to create an Inference Endpoint and it fails to start.
The model I am using is: tiiuae/falcon-40b-instruct
The configuration is: AWS us-east-1 GPU · Nvidia Tesla T4 · 4x GPU · 64 GB
The complete log output is below, but the relevant part seems to be the “HFValidationError: Repo id must use alphanumeric” as in the post above.

Any help would be greatly appreciated!

Here’s the complete log:


2024/01/03 21:52:54 ~ INFO | Start loading image artifacts from huggingface.co
2024/01/03 21:52:54 ~ INFO | Used configuration:
2024/01/03 21:52:54 ~ INFO | Repository ID: tiiuae/falcon-40b-instruct
2024/01/03 21:52:54 ~ INFO | Repository Revision: ecb78d97ac356d098e79f0db222c9ce7c5d9ee5f
2024/01/03 21:52:54 ~ INFO | Ignore regex pattern for files, which are not downloaded: *tflite, flax*, *ckpt, tf*, *onnx*, *tar.gz, *safetensors, *mlmodel, rust*, *openvino*
2024/01/03 21:54:05 ~ Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
2024/01/03 21:54:05 ~ Login successful
2024/01/03 21:54:05 ~ Token is valid.
2024/01/03 21:54:05 ~ Your token has been saved to /root/.cache/huggingface/token
2024/01/03 21:54:39 ~ {"timestamp":"2024-01-04T02:54:39.200748Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:54:39 ~ {"timestamp":"2024-01-04T02:54:39.200602Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:54:39 ~ {"timestamp":"2024-01-04T02:54:39.200639Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:54:44 ~ {"timestamp":"2024-01-04T02:54:44.400443Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:44 ~ {"timestamp":"2024-01-04T02:54:44.400411Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:45 ~ Error: DownloadError
2024/01/03 21:54:45 ~ {"timestamp":"2024-01-04T02:54:45.007548Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:54:46 ~ {"timestamp":"2024-01-04T02:54:46.323170Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:54:46 ~ {"timestamp":"2024-01-04T02:54:46.323061Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:54:46 ~ {"timestamp":"2024-01-04T02:54:46.323014Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:54:51 ~ {"timestamp":"2024-01-04T02:54:51.392245Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:51 ~ {"timestamp":"2024-01-04T02:54:51.392299Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:51 ~ Error: DownloadError
2024/01/03 21:54:51 ~ {"timestamp":"2024-01-04T02:54:51.929184Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:07 ~ {"timestamp":"2024-01-04T02:55:07.040546Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:07 ~ {"timestamp":"2024-01-04T02:55:07.040419Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:55:07 ~ {"timestamp":"2024-01-04T02:55:07.040456Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:55:12 ~ {"timestamp":"2024-01-04T02:55:12.260841Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:12 ~ {"timestamp":"2024-01-04T02:55:12.260808Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:12 ~ Error: DownloadError
2024/01/03 21:55:12 ~ {"timestamp":"2024-01-04T02:55:12.847314Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:41 ~ {"timestamp":"2024-01-04T02:55:41.040752Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:41 ~ {"timestamp":"2024-01-04T02:55:41.040611Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:55:41 ~ {"timestamp":"2024-01-04T02:55:41.040648Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:55:46 ~ {"timestamp":"2024-01-04T02:55:46.095858Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:46 ~ {"timestamp":"2024-01-04T02:55:46.095902Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:46 ~ Error: DownloadError
2024/01/03 21:55:46 ~ {"timestamp":"2024-01-04T02:55:46.647317Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:56:35 ~ {"timestamp":"2024-01-04T02:56:35.052474Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:56:35 ~ {"timestamp":"2024-01-04T02:56:35.052425Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:56:35 ~ {"timestamp":"2024-01-04T02:56:35.052600Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}

Vsauv · January 4, 2024, 6:39pm

Hi @meganariley. I have the same issue when I try to create an inference endpoint. I receive this message error “erver message:Endpoint failed to start. Endpoint failed”

Setup :
Model : mixtral-8x7b-instruct-v0-1-7
Configuration : GPU · Nvidia A10G · 1x GPU · 24 GB

Any help would incredibly help! You could save my life

I can provide the logs is necessary !

eugenecamus · January 8, 2024, 4:55pm

Hello @meganariley,

I ran into the same problem with the following setup:
defog/sqlcoder2
AWS
us-east-1
nvidia A10G

Thanks

meganariley · January 9, 2024, 5:17pm

@eugenecamus @Vsauv @IBLLMTST @GowthamYarlagadda @jostelo @alexsafayan

Thanks for your patience while we looked into this! We just deployed a fix, so you should be all set now. Please make sure to recreate the endpoint, and let us know if you run into any issues.

jostelo · January 12, 2024, 9:12am

Works! Thank you very much!

IBLLMTST · January 12, 2024, 3:46pm

Thank you, Megan!

collinbarnwell · February 9, 2024, 3:52pm

Hi @meganariley - I’m still experiencing this issue. I’ve recreated my endpoint but I get the same error.

I’m using a fork of the pyannote/speaker-diarization-3.1 model so that I can create my own handler.

I’ve tried using path in the __init__ function, which is set to “/repository”
I’ve tried making my fork public and private
I’ve tried using “pyannote/speaker-diarization-3.1” instead of path
I’ve tried making my fork public and using that model name instead of path

But I always get this error:

Server message:Endpoint failed to start. new fontManager /opt/conda/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 705, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 584, in __aenter__ await self._router.startup() File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 682, in startup await handler() File "/app/webservice_starlette.py", line 57, in some_startup_task inference_handler = get_inference_handler_either_custom_or_default_handler(HF_MODEL_DIR, task=HF_TASK) File "/app/huggingface_inference_toolkit/handler.py", line 41, in get_inference_handler_either_custom_or_default_handler custom_pipeline = check_and_register_custom_pipeline_from_directory(model_dir) File "/app/huggingface_inference_toolkit/utils.py", line 190, in check_and_register_custom_pipeline_from_directory custom_pipeline = handler.EndpointHandler(model_dir) File "/repository/handler.py", line 31, in __init__ self._pipeline = Pipeline.from_pretrained(path) File "/opt/conda/lib/python3.9/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained config_yml = hf_hub_download( File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'. Application startup failed. Exiting.

This is my handler:

from pyannote.audio import Pipeline, Audio
import torch


class EndpointHandler:
    def __init__(self, path=""):
        # initialize pretrained pipeline
        self._pipeline = Pipeline.from_pretrained(path)  # I've tried varations on this line as described above
        HYPER_PARAMETERS = {
            "segmentation": {
                "min_duration_off": 3.0,
            }
        }
        self._pipeline.instantiate(HYPER_PARAMETERS)

        # send pipeline to GPU if available
        if torch.cuda.is_available():
            self._pipeline.to(torch.device("cuda"))

        # initialize audio reader
        self._io = Audio()

I’ve tried this on a GPU and CPU and I get the same error. I’ve tried recreating the endpoint multiple times.

Please let me know what I can do.

Thanks,
Collin

Topic		Replies	Views
Cannot Setup Mixtral Models and Other Models on Inference Endpoints Inference Endpoints on the Hub	1	408	December 22, 2023
Key Error when trying to deploy inference endpoint Inference Endpoints on the Hub	2	789	December 3, 2023
Server message:Endpoint failed to start Inference Endpoints on the Hub	3	614	June 26, 2024
Stuck starting inference model Inference Endpoints on the Hub	6	2294	November 23, 2023
Inference Endpoints - No working code examples Inference Endpoints on the Hub	3	157	January 29, 2025

Inference Endpoint Fails to Start

Setup

Log:

Related topics