Formatting Inference API call for LLama 2

peteceptron · September 28, 2023, 7:36pm

I solved it by inputting a single string using the official Llama 2 format (see Llama 2 is here - get it on Hugging Face). I don’t know why the default Sagemaker Llama endpoint doesn’t work that way. But this works for me:

import json
import requests
API_URL = "https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b-chat-hf"
headers = {"Authorization": f"Bearer hf_XXXXXXXXXXXXXXXXXX",
        "Content-Type": "application/json",}

def query(payload):
    json_body = {
        "inputs": f"[INST] <<SYS>> Your job is to talk like a pirate. Every reponse must sound like a pirate. <<SYS>> {payload} [/INST] ",
                "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.7}
        }
    data = json.dumps(json_body)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    try:
        return json.loads(response.content.decode("utf-8"))
    except:
        return response

data = query("Just say hi!")
print(data[0]['generated_text'].split('[/INST] ')[1])

Topic		Replies	Views
What is the correct format conversational models in the inference api Beginners	0	177	July 31, 2024
Data Format while finetuning Llama2 for json extraction? Beginners	2	1556	November 10, 2023
How to use llm model's api? Beginners	2	3317	November 14, 2024
Is there llama3 api for hugging face to use? Beginners	4	922	September 8, 2024
How to set Llama-2-Chat prompt context Models	2	15513	October 18, 2023

Formatting Inference API call for LLama 2

Related topics