Hi, I have obtained access to Meta llama3 models, and I am trying to use it for inference using the sample code from model card. When I run my inference script, it gives me an error 'Cannot access gated repo for url huggingface..../meta-llama.....config.json
So my question is how can I access this model from my inference script? Do I need to pass any authientication/api/token key to ensure my script can access the model?
Also how to get huggingface version of the model locally?
You could do it by making the code look like this.
Sample code
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Actual Code
import transformers
import torch
hf_token = "hf_*********" # When uploading code, never write directly!
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
token=hf_token,
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
You can make as many tokens as you want here. You should name your tokens in such a way that you can easily identify which is which.
1 Like
nielsr
September 27, 2024, 6:36am
3
Make sure to authenticate with huggingface-cli login
in the terminal before running the script.
system
Closed
September 30, 2024, 9:42pm
4
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.