How do I send system prompts using inference api serverless, llama3 8b instruct model

I’m using lama 3 8 b model, using the inference api serverless, like this:

import requests

API_URL = ""
headers = {"Authorization": "Bearer mytoken"}

def query(payload):
	response =, headers=headers, json=payload)
	return response.json()
output = query({
	"inputs": "Can you please let us know more details about your ",

everything works great, but how do I use a system prompt, & do I need to send it with every query or only once, also is it possible to have it remember conversations or do I’ve to send old ones with every query, finally, how do I make it to complete or to chat, like on the huggingface chat ui

1 Like

Hey I found this post that helped me:

Meta Llama 3 | Model Cards and Prompt formats for the appropriate tags of Llama 3