Inference Endpoint Fails to Start

Whenever I start to start an Inference Endpoint, it fails.

Setup

aws-dolphin-2-5-mixtral-8x7b-934

AWS

us-east-1

GPU · Nvidia Tesla T4 · 4x GPU · 64 GB

Log:

66d6896fchcwbw 2023-12-14T10:25:02.055Z  INFO | Repository ID: ehartford/dolphin-2.5-mixtral-8x7b
66d6896fchcwbw 2023-12-14T10:25:02.055Z  INFO | Repository Revision: cbf205ee8b82ec81cec217b124e5d57804c3139a

66d6896fchcwbw 2023-12-14T10:26:53.761Z Error: DownloadError
66d6896fchcwbw 2023-12-14T10:26:53.761Z {"timestamp":"2023-12-14T10:26:53.761152Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n... huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'... ValueError: Can't find 'adapter_config.json' at '/repository'"}} [...]

Running into this error as well. Contacted support.

Did support fix this issue?

Yes. Everything is working for me now.

Hi, how did you fix this? What steps did support told you? I’m also facing the same error currently…

They didn’t provide me with any steps. It was an internal error that they fixed.
You should contact support with your details.

Hi @alexsafayan! Thanks for reporting this issue. We’re currently investigating this issue and I’ll update you as soon as I have more information or there’s a fix.

Hello @meganariley. Same problem here for the same VM but different models. E.g. LeoLM/leo-hessianai-7b-chat .

Thanks in advance!

Thanks @jostelo! I’ll keep you updated as well. :hugs:

1 Like

Hello @meganariley. Could you resolve my issue too? While trying to create an Inference Endpoint, I get this error: “Server message: Endpoint failed to start. Endpoint failed”

[nonprof/llama2-7b-finetuned-counsellor-full-model] (nonprof/llama2-7b-finetuned-counsellor-full-model · Hugging Face).

In the logs:
HFValidationError(\n\nhuggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or ‘-’, '', ‘.’, ‘–’ and ‘…’ are forbidden, ‘-’ and ‘.’ cannot start or end the name, max length is 96: ‘/repository’.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File "/opt/conda/bin/text-generation-server", line 8, in \n sys.exit(app())\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File "/opt/conda/lib/python3.10/site-packages/peft/utils/config.py", line 121, in from_pretrained\n raise ValueError(f"Can’t find ‘{CONFIG_NAME}’ at ‘{pretrained_model_name_or_path}’")\n\nValueError: Can’t find ‘adapter_config.json’ at ‘/repository’\n\n"},“target”:“text_generation_launcher”,“span”:{“name”:“download”},“spans”:[{“name”:“download”}]}

Thanks in advance! You could be my lifesaver!

I am having the same (or at least very similar) problem. I am trying to create an Inference Endpoint and it fails to start.
The model I am using is: tiiuae/falcon-40b-instruct
The configuration is: AWS us-east-1 GPU · Nvidia Tesla T4 · 4x GPU · 64 GB
The complete log output is below, but the relevant part seems to be the “HFValidationError: Repo id must use alphanumeric” as in the post above.

Any help would be greatly appreciated!

Here’s the complete log:


2024/01/03 21:52:54 ~ INFO | Start loading image artifacts from huggingface.co
2024/01/03 21:52:54 ~ INFO | Used configuration:
2024/01/03 21:52:54 ~ INFO | Repository ID: tiiuae/falcon-40b-instruct
2024/01/03 21:52:54 ~ INFO | Repository Revision: ecb78d97ac356d098e79f0db222c9ce7c5d9ee5f
2024/01/03 21:52:54 ~ INFO | Ignore regex pattern for files, which are not downloaded: *tflite, flax*, *ckpt, tf*, *onnx*, *tar.gz, *safetensors, *mlmodel, rust*, *openvino*
2024/01/03 21:54:05 ~ Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
2024/01/03 21:54:05 ~ Login successful
2024/01/03 21:54:05 ~ Token is valid.
2024/01/03 21:54:05 ~ Your token has been saved to /root/.cache/huggingface/token
2024/01/03 21:54:39 ~ {"timestamp":"2024-01-04T02:54:39.200748Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:54:39 ~ {"timestamp":"2024-01-04T02:54:39.200602Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:54:39 ~ {"timestamp":"2024-01-04T02:54:39.200639Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:54:44 ~ {"timestamp":"2024-01-04T02:54:44.400443Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:44 ~ {"timestamp":"2024-01-04T02:54:44.400411Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:45 ~ Error: DownloadError
2024/01/03 21:54:45 ~ {"timestamp":"2024-01-04T02:54:45.007548Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:54:46 ~ {"timestamp":"2024-01-04T02:54:46.323170Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:54:46 ~ {"timestamp":"2024-01-04T02:54:46.323061Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:54:46 ~ {"timestamp":"2024-01-04T02:54:46.323014Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:54:51 ~ {"timestamp":"2024-01-04T02:54:51.392245Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:51 ~ {"timestamp":"2024-01-04T02:54:51.392299Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:54:51 ~ Error: DownloadError
2024/01/03 21:54:51 ~ {"timestamp":"2024-01-04T02:54:51.929184Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:07 ~ {"timestamp":"2024-01-04T02:55:07.040546Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:07 ~ {"timestamp":"2024-01-04T02:55:07.040419Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:55:07 ~ {"timestamp":"2024-01-04T02:55:07.040456Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:55:12 ~ {"timestamp":"2024-01-04T02:55:12.260841Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:12 ~ {"timestamp":"2024-01-04T02:55:12.260808Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:12 ~ Error: DownloadError
2024/01/03 21:55:12 ~ {"timestamp":"2024-01-04T02:55:12.847314Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:41 ~ {"timestamp":"2024-01-04T02:55:41.040752Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:55:41 ~ {"timestamp":"2024-01-04T02:55:41.040611Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:55:41 ~ {"timestamp":"2024-01-04T02:55:41.040648Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:55:46 ~ {"timestamp":"2024-01-04T02:55:46.095858Z","level":"INFO","fields":{"message":"Peft model detected.\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:46 ~ {"timestamp":"2024-01-04T02:55:46.095902Z","level":"INFO","fields":{"message":"Loading the model it might take a while without feedback\n"},"target":"text_generation_launcher"}
2024/01/03 21:55:46 ~ Error: DownloadError
2024/01/03 21:55:46 ~ {"timestamp":"2024-01-04T02:55:46.647317Z","level":"ERROR","fields":{"message":"Download encountered an error: Traceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 16, in download_and_unload_peft\n model = AutoPeftModelForCausalLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 117, in from_pretrained\n config_file = hf_hub_download(\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 110, in _inner_fn\n validate_repo_id(arg_value)\n\n File \"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py\", line 164, in validate_repo_id\n raise HFValidationError(\n\nhuggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'.\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 204, in download_weights\n utils.download_and_unload_peft(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py\", line 24, in download_and_unload_peft\n model = AutoPeftModelForSeq2SeqLM.from_pretrained(\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/auto.py\", line 69, in from_pretrained\n peft_config = PeftConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)\n\n File \"/opt/conda/lib/python3.10/site-packages/peft/utils/config.py\", line 121, in from_pretrained\n raise ValueError(f\"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'\")\n\nValueError: Can't find 'adapter_config.json' at '/repository'\n\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/01/03 21:56:35 ~ {"timestamp":"2024-01-04T02:56:35.052474Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/01/03 21:56:35 ~ {"timestamp":"2024-01-04T02:56:35.052425Z","level":"INFO","fields":{"message":"Args { model_id: \"/repository\", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 1512, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 2048, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: \"e-8a68-aws-falcon-40b-instruct-1221-fdd998944-hr86z\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }"},"target":"text_generation_launcher"}
2024/01/03 21:56:35 ~ {"timestamp":"2024-01-04T02:56:35.052600Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}

Hi @meganariley. I have the same issue when I try to create an inference endpoint. I receive this message error “erver message:Endpoint failed to start. Endpoint failed”

Setup :
Model : mixtral-8x7b-instruct-v0-1-7
Configuration : GPU · Nvidia A10G · 1x GPU · 24 GB

Any help would incredibly help! You could save my life

I can provide the logs is necessary !

Hello @meganariley,

I ran into the same problem with the following setup:
defog/sqlcoder2
AWS
us-east-1
nvidia A10G

Thanks

@eugenecamus @Vsauv @IBLLMTST @GowthamYarlagadda @jostelo @alexsafayan

Thanks for your patience while we looked into this! We just deployed a fix, so you should be all set now. Please make sure to recreate the endpoint, and let us know if you run into any issues.

1 Like

Works! Thank you very much!

Thank you, Megan!

Hi @meganariley - I’m still experiencing this issue. I’ve recreated my endpoint but I get the same error.

I’m using a fork of the pyannote/speaker-diarization-3.1 model so that I can create my own handler.

  • I’ve tried using path in the __init__ function, which is set to “/repository”
  • I’ve tried making my fork public and private
  • I’ve tried using “pyannote/speaker-diarization-3.1” instead of path
  • I’ve tried making my fork public and using that model name instead of path

But I always get this error:

Server message:Endpoint failed to start. new fontManager /opt/conda/lib/python3.9/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 705, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 584, in __aenter__ await self._router.startup() File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 682, in startup await handler() File "/app/webservice_starlette.py", line 57, in some_startup_task inference_handler = get_inference_handler_either_custom_or_default_handler(HF_MODEL_DIR, task=HF_TASK) File "/app/huggingface_inference_toolkit/handler.py", line 41, in get_inference_handler_either_custom_or_default_handler custom_pipeline = check_and_register_custom_pipeline_from_directory(model_dir) File "/app/huggingface_inference_toolkit/utils.py", line 190, in check_and_register_custom_pipeline_from_directory custom_pipeline = handler.EndpointHandler(model_dir) File "/repository/handler.py", line 31, in __init__ self._pipeline = Pipeline.from_pretrained(path) File "/opt/conda/lib/python3.9/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained config_yml = hf_hub_download( File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '/repository'. Application startup failed. Exiting.

This is my handler:

from pyannote.audio import Pipeline, Audio
import torch


class EndpointHandler:
    def __init__(self, path=""):
        # initialize pretrained pipeline
        self._pipeline = Pipeline.from_pretrained(path)  # I've tried varations on this line as described above
        HYPER_PARAMETERS = {
            "segmentation": {
                "min_duration_off": 3.0,
            }
        }
        self._pipeline.instantiate(HYPER_PARAMETERS)

        # send pipeline to GPU if available
        if torch.cuda.is_available():
            self._pipeline.to(torch.device("cuda"))

        # initialize audio reader
        self._io = Audio()

I’ve tried this on a GPU and CPU and I get the same error. I’ve tried recreating the endpoint multiple times.

Please let me know what I can do.

Thanks,
Collin