Using Token to Access Llama2

I have been granted access to the Llama 70b model (it says that when I go to that page). I have a token. I tried this code (but I replaced “my token” with the actual token in quotes. However, I get this error: “OSError: You are trying to access a gated repo.
Make sure to request access at meta-llama/Llama-2-70b-chat-hf · Hugging Face and pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>.”

MY CODE:

from transformers import AutoModelForCausalLM, AutoTokenizer
import os

hf_access_token = “my token”
os.environ[“HF_ACCESS_TOKEN”] = hf_access_token

tokenizer = AutoTokenizer.from_pretrained(“meta-llama/Llama-2-70b-chat-hf”)
model = AutoModelForCausalLM.from_pretrained(“meta-llama/Llama-2-70b-chat-hf”)


I also tried this code with my actual token instead of your_token_here

from transformers import LlamaForCausalLM, LlamaTokenizer
import sentencepiece

token = “<your_token_here>” # Replace <your_token_here> with the token you obtained

tokenizer = LlamaTokenizer.from_pretrained(“meta-llama/Llama-2-70b-chat-hf”, use_auth_token=token)
model = LlamaForCausalLM.from_pretrained(“meta-llama/Llama-2-70b-chat-hf”, use_auth_token=token)

Again, I got a message about it being a gated model. Can anyone tell me what I am doing wrong? Thank you .

1 Like

I’ve been having the same silly problem. Try changing the parameter name from use_auth_token to simply token (ignoring the HF documentation:

token = “<your_token_here>” # Replace <your_token_here> with the token you obtained

tokenizer = LlamaTokenizer.from_pretrained(“meta-llama/Llama-2-70b-chat-hf”, token=token)
model = LlamaForCausalLM.from_pretrained(“meta-llama/Llama-2-70b-chat-hf”, token=token)

huggingface_hub import notebook_login

run this and then

notebook_login()

paste your token

In my case (and I suspect many others), this will not work. I wasn’t able to interact with the script as this command requires. For context, I send a request to a remote machine which executes the script to download a HF model specified by the user, then run inference and send back the response. In situations where the script is noninteractive, use the DrewG’s solution above. In interactive sessions, El-chapoo’s should work well.