Confusion regarding when to use dict-styled chat dialogue vs. when to format using chat template

Hi there!

I need to query several language models (Llama-3.1 8b instruct, Gemma-2 9B instruct and Aya Expanse 8B).

A minimally reproducible example for Gemma-2 9B is the following (source: google/gemma-2-9b-it · Hugging Face)

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="google/gemma-2-9b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
# Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world.  So, what be yer pleasure, eh? 🦜

Here, messages are formatted in a list-dict ‘dialogue’ format. Many models though define the desired formatting style, e.g. (Llama 3.1 | Model Cards and Prompt formats) define quite strict prompt formatting guidelines:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython<|eot_id|><|start_header_id|>user<|end_header_id|>

Write code to check if number is prime, use that to see if the number 7 is prime<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I’m quite confused how to ensure I deal with this properly. I’m assuming the list-dict ‘dialogue’ format is used by the pipeline class to automatically apply the model-specific formatting during pipeline inference, correct?

Which docs can I best take a look at to clear this up for myself and what guidelines do you recommend?

Thanks!

1 Like