Understanding repetition_penalty in LLaMA-2 Pretrained Model

Hi, I am trying to use meta-llama/llama-2-7b-chat-hf model for text generation.

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'Tell me the largest country in the world.\n',
    do_sample=True,
    top_k=10,
    repetition_penalty=1.1,
    num_return_sequences=1,
    max_length=200,
)

For the hyperparameter repetition_penalty, while I comprehend that a higher repetition_penalty promotes the generation of more diverse tokens, I’m seeking a more quantitative explanation of its mechanism. For example, its value range, and which value causes no penalty.

OpenAI has detailed how frequency and presence penalties influence token probability distribution in its chat.completion here. The formula provided is as below. If setting requency and presence penalties as 0, there is no penalty on repetition.

mu[j] -> mu[j] - c[j] * alpha_frequency - float(c[j] > 0) * alpha_presence

However, I haven’t come across a similar mathematical description for the repetition_penalty in LLaMA-2 (including its research paper). Could anyone provide insights?

1 Like