How to use gated model in inference

John6666 · September 26, 2024, 11:27pm

You could do it by making the code look like this.

Sample code

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Actual Code

import transformers
import torch

hf_token = "hf_*********" # When uploading code, never write directly!

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    token=hf_token,
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

You can make as many tokens as you want here. You should name your tokens in such a way that you can easily identify which is which.
https://huggingface.co/settings/tokens

Topic		Replies	Views
Inference Issue with Llama Models using HF Inference Beginners	1	39	February 6, 2025
How to use gated models? 🤗Hub	9	31838	September 17, 2024
Unable to access Llama3.1 model despite having access granted Models	1	536	September 9, 2024
How to use llm (access fail) Beginners	4	429	August 21, 2024
Inference Endpoints 401 Error Intermediate	2	484	July 15, 2024

How to use gated model in inference

Sample code

Actual Code

Related topics