Hi, I wan to know how to implement few-shot prompting with the LLaMA-2 chat model. Currently, I have a basic zero-shot prompt setup as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "Please answer the math question."},
{"role": "user", "content": "2+2=?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
generated_ids = model.generate(input_ids, max_new_tokens=1000, do_sample=True)
outputs = tokenizer.batch_decode(generated_ids)
I’m considering adding a few examples to the messages sequence for few-shot prompting. However, I haven’t found any specific guidelines on this for LLaMA-2. Drawing inspiration from a blog about how to fewshot prompt with OpenAI API, my idea is to insert several user
and assistant
interactions right after the system
prompt. It looks like this:
messages = [
{"role": "system", "content": "Please answer the math question."},
{"role": "user", "content": "1+1=?"}, # example 1
{"role": "assistant", "content": "2"}, # example 1
{"role": "user", "content": "1+2=?"}, # example 2
{"role": "assistant", "content": "3"}, # example 2
{"role": "user", "content": "2+2=?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
generated_ids = model.generate(input_ids, max_new_tokens=1000, do_sample=True)
outputs = tokenizer.batch_decode(generated_ids)
Is this approach correct?