Understanding repetition_penalty in LLaMA-2 Pretrained Model

audreyeleven · December 17, 2023, 10:23pm

Hi, I am trying to use meta-llama/llama-2-7b-chat-hf model for text generation.

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'Tell me the largest country in the world.\n',
    do_sample=True,
    top_k=10,
    repetition_penalty=1.1,
    num_return_sequences=1,
    max_length=200,
)

For the hyperparameter repetition_penalty, while I comprehend that a higher repetition_penalty promotes the generation of more diverse tokens, I’m seeking a more quantitative explanation of its mechanism. For example, its value range, and which value causes no penalty.

OpenAI has detailed how frequency and presence penalties influence token probability distribution in its chat.completion here. The formula provided is as below. If setting requency and presence penalties as 0, there is no penalty on repetition.

mu[j] -> mu[j] - c[j] * alpha_frequency - float(c[j] > 0) * alpha_presence

However, I haven’t come across a similar mathematical description for the repetition_penalty in LLaMA-2 (including its research paper). Could anyone provide insights?

Topic		Replies	Views
Loading pre-trained models with AddedTokens 🤗Transformers	2	739	October 14, 2024
meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API 🤗Transformers	1	113	February 2, 2025
Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines 🤗Transformers	16	28888	January 10, 2025
How to set generate parameters in fine-tuning 🤗Transformers	1	749	October 12, 2023
Repetition Issues in Llama Models (3:8B, 3:70B, 3.1, 3.2) Models	1	494	March 5, 2025

Understanding repetition_penalty in LLaMA-2 Pretrained Model

Related topics