401 unauthorized for create_inference_endpoint

Hello,
I’m trying to create an inference endpoint with a custom docker image using the huggingface_hub library because the web interface doesn’t allow environment variables to be set.

I first log in with huggingface-cli login, then call create_inference_endpoint and get a 401 error, then I tried calling login() within the python script in case that would help and got an identical error, shown below. I’ve also tried creating a new access token with the same result, and upgraded huggingface_hub from 0.20.x to 0.23.0, also with no change.
I’m at a bit of a loss; I suppose I could set the environment variables when creating the docker image, but that’s a pretty messy workaround for something that ought to work.

Enter your token (input will not be visible):
Add token as git credential? (Y/n)
Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/srt/.cache/huggingface/token
Login successful
Traceback (most recent call last):
File “/home/srt/anaconda3/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py”, line 304, in hf_raise_for_status
response.raise_for_status()
File “/home/srt/anaconda3/lib/python3.11/site-packages/requests/models.py”, line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.endpoints.huggingface.cloud/v2/endpoint/srt-primis

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/srt/hf_test/create_endpoint.py”, line 5, in
endpoint = create_inference_endpoint(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/srt/anaconda3/lib/python3.11/site-packages/huggingface_hub/hf_api.py”, line 7264, in create_inference_endpoint
hf_raise_for_status(response)
File “/home/srt/anaconda3/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py”, line 371, in hf_raise_for_status
raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 401 Client Error: Unauthorized for url: https://api.endpoints.huggingface.cloud/v2/endpoint/srt-primis (Request ID: IWBbJQ)

401 Unauthorized

2 Likes

I am also trying to create an inference endpoint for a private model in my organization. I have the admin role in my organization and have logged in with a write access token but I am still getting this error. Not sure what the reason might be.

2 Likes

Has anyone found a solution for this yet? Currently struggling with the same issue.

I’ve a llama3-8B fine tuned pipeline in Colab built with transformers. I’ve tested it in Colab and working fine. The only problem I have is that I can’t seem to be able to get passed this error on how the model is gated.

I’ve created an env variable HF_TOKEN_API in my runtime, passing that everywhere in the code.

I’ve added it in the Inference Endpoint settings too as an env variable along with its value and getting the same thing.

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B.
401 Client Error. (Request ID: Root=1-668feaf5-37525ba76173671b77b38d68;fad4290c-56e7-4679-a073-69c13b9ec2e8)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

Application startup failed. Exiting.
1 Like

Hey gang,

I figured it out.

So HuggingFace has some preset environment variables that are there for us to use.

Link: Environment variables (huggingface.co)

The idea is that when you build your pipeline you need to login either via the hugging-face cli or import the login from huggingface_hub like so:

from huggingface_hub import login

In order to get pass this error is to create and pass in your huggingface token to the preset hugging face environment variable called HF_TOKEN like so:

# Declare env variable in the pipeline and use it where needed

os.environ["HF_TOKEN"] = "You hugging face token" - Also this needs to be either write or fine grained.

You will pass this everywhere you need to use the hugging face token like so:

# When you login

login(token=os.environ["HF_TOKEN"], add_to_git_credential=True)

# When you define training arguments

training_args = TrainingArguments(
    ... other trainig arguments,
    hub_token=os.environ["HF_TOKEN"],
)

#When you push the model and tokenizer to the HF repo

model.push_to_hub(repo_name, use_auth_token=os.environ["HF_TOKEN"])
tokenizer.push_to_hub(repo_name, use_auth_token=os.environ["HF_TOKEN"])

The last step is to declare the HF_TOKEN as an env variable in the Inference Endpoints settings and pass in the value and then deploy the endpoint.

This should short out your issues.