How to use gated model in inference

delving-deep · September 26, 2024, 11:10pm

Hi, I have obtained access to Meta llama3 models, and I am trying to use it for inference using the sample code from model card. When I run my inference script, it gives me an error 'Cannot access gated repo for url huggingface..../meta-llama.....config.json

So my question is how can I access this model from my inference script? Do I need to pass any authientication/api/token key to ensure my script can access the model?

Also how to get huggingface version of the model locally?

John6666 · September 26, 2024, 11:27pm

You could do it by making the code look like this.

Sample code

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Actual Code

import transformers
import torch

hf_token = "hf_*********" # When uploading code, never write directly!

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    token=hf_token,
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

You can make as many tokens as you want here. You should name your tokens in such a way that you can easily identify which is which.

nielsr · September 27, 2024, 6:36am

Make sure to authenticate with huggingface-cli login in the terminal before running the script.

system · September 30, 2024, 9:42pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use llm (access fail) Beginners	4	137	August 21, 2024
Llama 3.2 1G error Beginners	1	99	October 3, 2024
How to use gated models? 🤗Hub	9	24533	September 17, 2024
Langchain ChatHuggingFace Beginners	14	57	December 14, 2024
When deploying AutoTrained model: "Cannot access gated repo" 🤗AutoTrain	1	635	May 1, 2024

How to use gated model in inference

Sample code

Actual Code

Related topics