Inference Endpoints 401 Error

TheRealPapaBear · July 11, 2024, 2:34pm

I have a llama3-8B fine tuned pipeline in Colab built with transformers. I’ve tested it in Colab and working fine. The only problem I have is that I can’t seem to be able to get passed this error on how the model is gated.

I’ve created an env variable HF_TOKEN_API in my runtime, passing that everywhere in the code.

I’ve added it in the Inference Endpoint settings too as an env variable along with its value and getting the same thing.

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B.
401 Client Error. (Request ID: Root=1-668feaf5-37525ba76173671b77b38d68;fad4290c-56e7-4679-a073-69c13b9ec2e8)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

Application startup failed. Exiting.

TheRealPapaBear · July 15, 2024, 8:59am

I figured it out.

So HuggingFace has some preset environment variables that are there for us to use.

Link: Environment variables (huggingface.co)

The idea is that when you build your pipeline you need to login either via the hugging-face cli or import the login from huggingface_hub like so:

from huggingface_hub import login

In order to get pass this error is to create and pass in your huggingface token to the preset hugging face environment variable called HF_TOKEN like so:

# Declare env variable in the pipeline and use it where needed

os.environ["HF_TOKEN"] = "You hugging face token" - Also this needs to be either write or fine grained.

You will pass this everywhere you need to use the hugging face token like so:

# When you login

login(token=os.environ["HF_TOKEN"], add_to_git_credential=True)

# When you define training arguments

training_args = TrainingArguments(
    ... other trainig arguments,
    hub_token=os.environ["HF_TOKEN"],
)

#When you push the model and tokenizer to the HF repo

model.push_to_hub(repo_name, use_auth_token=os.environ["HF_TOKEN"])
tokenizer.push_to_hub(repo_name, use_auth_token=os.environ["HF_TOKEN"])

The last step is to declare the HF_TOKEN as an env variable in the Inference Endpoints settings and pass in the value and then deploy the endpoint.

This should short out your issues.

system · July 17, 2024, 1:16pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference endpoint, gated repo 401 error Inference Endpoints on the Hub	4	190	July 25, 2024
Not able to access after login through hugging face hub in google colab 🤗Transformers	1	146	December 13, 2024
Inference Issue with Llama Models using HF Inference Beginners	1	30	February 6, 2025
Inference Endpoint Deployment Error Beginners	2	39	February 7, 2025
401 unauthorized for create_inference_endpoint 🤗Hub	3	928	July 15, 2024

Inference Endpoints 401 Error

Related topics