I am trying to call the Hugging Face Inference API to generate text using Llama-2 (specifically, Llama-2-7b-chat-hf). Following this documentation page, I am able to generate text using the following code:
import json
import requests
API_URL = "https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b-chat-hf"
headers = {"Authorization": f"Bearer hf_XXXXXXXXXXXXXXXX",
"Content-Type": "application/json",}
def query(payload):
data = json.dumps({"inputs": payload})
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query("Can you please let us know more details about your ")
However, when I try to set the context in the input, the Inference API response cannot parse my request.
For example, using my own AWS endpoint, following this, I am able to input this json body to Llama 2:
{
"inputs": [
[
{"role": "system", "content": "You are chat bot who writes songs"},
{"role": "user", "content": "Write a rap about Barbie"}
]
],
"parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}
But in the code above, when I call json.dumps on this json body, I get this error in the resonse:
Failed to deserialize the JSON body into the target type: inputs: invalid type: sequence, expected a string at line 1 column 11.
I haven’t been able to figure out a formatting that works. Is there an example or documentation anywhere for using Llama 2 with the Hugging Face Inference API?