How to Implement Few-Shot Prompting in LLaMA-2 Chat Model

Hi, I wan to know how to implement few-shot prompting with the LLaMA-2 chat model. Currently, I have a basic zero-shot prompt setup as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Please answer the math question."},
    {"role": "user", "content": "2+2=?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")

generated_ids = model.generate(input_ids, max_new_tokens=1000, do_sample=True)
outputs = tokenizer.batch_decode(generated_ids)

I’m considering adding a few examples to the messages sequence for few-shot prompting. However, I haven’t found any specific guidelines on this for LLaMA-2. Drawing inspiration from a blog about how to fewshot prompt with OpenAI API, my idea is to insert several user and assistant interactions right after the system prompt. It looks like this:

messages = [
    {"role": "system", "content": "Please answer the math question."},
    {"role": "user", "content": "1+1=?"},  # example 1
    {"role": "assistant", "content": "2"},  # example 1
    {"role": "user", "content": "1+2=?"},  # example 2
    {"role": "assistant", "content": "3"},  # example 2
    {"role": "user", "content": "2+2=?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")

generated_ids = model.generate(input_ids, max_new_tokens=1000, do_sample=True)
outputs = tokenizer.batch_decode(generated_ids)

Is this approach correct?

1 Like

I also have the same question. Were you able to find the best practice for doing this?