There seems to be an internal authentication issue when deploying a private LoRA adapter with Inference Endpoints / TGI 3.1.0:
We have a private LoRA of meta-llama/Llama-3.2-3B-Instruct on our organization account, and I’m trying to deploy it on an Inference Endpoint serving this Llama base model, using the env variable: LORA_ADAPTERS=IntRobotics/Llama-3.2-3B-Instruct-Summarize-LoRA .
The deployment fails, logs are inaccessible, and I’m seeing the following error on the Endpoint’s Overview page:
[Server message]Endpoint failed to start
See details
Exit code: 1. Reason: ponse.status_code == 400: │\n│ 457 │ │ │ message = ( │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ endpoint_name = None │ │\n│ │ error_code = None │ │\n│ │ error_message = 'Invalid username or password.' │ │\n│ │ message = '401 Client Error.\\n\\nRepository Not Found for url: │ │\n│ │ https://huggingface.co/api/mode'+208 │ │\n│ │ response = <Response [401]> │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n╰──────────────────────────────────────────────────────────────────────────────╯\nRepositoryNotFoundError: 401 Client Error. (Request ID: \nRoot=1-67be036e-09142c4566699c28451be1ca;f6aa8277-5e41-46bb-814b-59b36d834f05)\n\nRepository Not Found for url: \nhttps://huggingface.co/api/models/IntRobotics/Llama-3.2-3B-Instruct-Summarize-Lo\nRA.\nPlease make sure you specified the correct `repo_id` and `repo_type`.\nIf you are trying to access a private or gated repo, make sure you are \nauthenticated.\nInvalid username or password."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
Error: DownloadError
Testing with a public LoRA model instead (e.g. LORA_ADAPTERS=RayBernard/llama3.2-3B-ft-reasoning) works smoothly.
Loading the private adapter in python also works without issue, once authenticated:
from transformers import AutoModelForCausalLM
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-3B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(model, "IntRobotics/Llama-3.2-3B-Instruct-Summarize-LoRA")