I’m using lama 3 8 b model, using the inference api serverless, like this:
import requests
API_URL = "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct"
headers = {"Authorization": "Bearer mytoken"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Can you please let us know more details about your ",
})
print(output)
everything works great, but how do I use a system prompt, & do I need to send it with every query or only once, also is it possible to have it remember conversations or do I’ve to send old ones with every query, finally, how do I make it to complete or to chat, like on the huggingface chat ui