Inference Endpoints 401 Error

I have a llama3-8B fine tuned pipeline in Colab built with transformers. I’ve tested it in Colab and working fine. The only problem I have is that I can’t seem to be able to get passed this error on how the model is gated.

I’ve created an env variable HF_TOKEN_API in my runtime, passing that everywhere in the code.

I’ve added it in the Inference Endpoint settings too as an env variable along with its value and getting the same thing.

OSError: You are trying to access a gated repo.
Make sure to have access to it at
401 Client Error. (Request ID: Root=1-668feaf5-37525ba76173671b77b38d68;fad4290c-56e7-4679-a073-69c13b9ec2e8)

Cannot access gated repo for url
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

Application startup failed. Exiting.
1 Like

I figured it out.

So HuggingFace has some preset environment variables that are there for us to use.

Link: Environment variables (

The idea is that when you build your pipeline you need to login either via the hugging-face cli or import the login from huggingface_hub like so:

from huggingface_hub import login

In order to get pass this error is to create and pass in your huggingface token to the preset hugging face environment variable called HF_TOKEN like so:

# Declare env variable in the pipeline and use it where needed

os.environ["HF_TOKEN"] = "You hugging face token" - Also this needs to be either write or fine grained.

You will pass this everywhere you need to use the hugging face token like so:

# When you login

login(token=os.environ["HF_TOKEN"], add_to_git_credential=True)

# When you define training arguments

training_args = TrainingArguments(
    ... other trainig arguments,

#When you push the model and tokenizer to the HF repo

model.push_to_hub(repo_name, use_auth_token=os.environ["HF_TOKEN"])
tokenizer.push_to_hub(repo_name, use_auth_token=os.environ["HF_TOKEN"])

The last step is to declare the HF_TOKEN as an env variable in the Inference Endpoints settings and pass in the value and then deploy the endpoint.

This should short out your issues.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.