Inference Endpoint can't access private LoRA

There seems to be an internal authentication issue when deploying a private LoRA adapter with Inference Endpoints / TGI 3.1.0:

We have a private LoRA of meta-llama/Llama-3.2-3B-Instruct on our organization account, and I’m trying to deploy it on an Inference Endpoint serving this Llama base model, using the env variable: LORA_ADAPTERS=IntRobotics/Llama-3.2-3B-Instruct-Summarize-LoRA .
The deployment fails, logs are inaccessible, and I’m seeing the following error on the Endpoint’s Overview page:

[Server message]Endpoint failed to start
See details
Exit code: 1. Reason: ponse.status_code == 400:                              │\n│   457 │   │   │   message = (                                                │\n│                                                                              │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ endpoint_name = None                                                     │ │\n│ │    error_code = None                                                     │ │\n│ │ error_message = 'Invalid username or password.'                          │ │\n│ │       message = '401 Client Error.\\n\\nRepository Not Found for url:      │ │\n│ │                 https://huggingface.co/api/mode'+208                     │ │\n│ │      response = <Response [401]>                                         │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n╰──────────────────────────────────────────────────────────────────────────────╯\nRepositoryNotFoundError: 401 Client Error. (Request ID: \nRoot=1-67be036e-09142c4566699c28451be1ca;f6aa8277-5e41-46bb-814b-59b36d834f05)\n\nRepository Not Found for url: \nhttps://huggingface.co/api/models/IntRobotics/Llama-3.2-3B-Instruct-Summarize-Lo\nRA.\nPlease make sure you specified the correct `repo_id` and `repo_type`.\nIf you are trying to access a private or gated repo, make sure you are \nauthenticated.\nInvalid username or password."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
Error: DownloadError

Testing with a public LoRA model instead (e.g. LORA_ADAPTERS=RayBernard/llama3.2-3B-ft-reasoning) works smoothly.

Loading the private adapter in python also works without issue, once authenticated:

from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-3B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(model, "IntRobotics/Llama-3.2-3B-Instruct-Summarize-LoRA")
1 Like

Hi @nikos-ir We’re taking a look into this and I’ll update you soon.

2 Likes

Hi @nikos-ir Can you please also add your hf_token as an env variable to the Endpoint and try once more? Let us know if you’re still running into issues.

2 Likes

Hey Megan!
Setting the HF_TOKEN env variable indeed solved the problem.
Thank you!

2 Likes

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.