Repetitive Answers From Fine-Tuned LLM

emre570 · October 3, 2024, 7:49pm

Hello people, I fine-tuned Llama 3.2-1B model with a Turkish dataset.

Sometimes it gives repetitive answers. Our dataset does not contain any repetitive lines. This problem occurs ever model that I fine-tuned. I am making mistakes somewhere but don’t know where. Here is the model and dataset first if you want to inspect:

You may not understand since it’s Turkish, but the dataset we combined is some kind of “synthetic” or “direct”. They are not so user-friendly. Maybe it’s related to dataset.

Here is code:

from huggingface_hub import notebook_login
notebook_login()

# %%
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

# %%
from peft import LoraConfig
from transformers import BitsAndBytesConfig

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj"],
    task_type="CAUSAL_LM",
)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# %%
from transformers import AutoTokenizer, AutoModelForCausalLM

modelName = "meta-llama/Llama-3.2-1B"

tokenizer = AutoTokenizer.from_pretrained(modelName)
model = AutoModelForCausalLM.from_pretrained(modelName, quantization_config=bnb_config, device_map="auto")

# %%
from datasets import load_dataset
dataset = load_dataset("myzens/alpaca-turkish-combined", split="train")
dataset, dataset[0]

# %%
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

bos_token = tokenizer.bos_token
eos_token = tokenizer.eos_token

tokenizer.pad_token_id = 128002
pad_token = tokenizer.pad_token

# %%
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = bos_token + alpaca_prompt.format(instruction, input, output) + eos_token
        texts.append(text)
    return { "text" : texts, }
pass

# %%
dataset = dataset.map(formatting_prompts_func, batched = True)

# %%
print(dataset["text"][0])

# %%
from transformers import TrainingArguments

train_args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        #max_steps = 150,
        num_train_epochs = 1,
        gradient_checkpointing = True,
        learning_rate = 2e-4,
        bf16 = True,
        logging_steps = 250,
        optim = "adamw_hf",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        output_dir = "llama3.2-1b-tr",
)

# %%
from trl import SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    args = train_args,
    peft_config = lora_config,
    train_dataset = dataset,
    dataset_text_field = "text",
    packing = False,
)
trainer.train()

# %%
model.push_to_hub("emre570/llama3.2-1b-tr-qlora")
tokenizer.push_to_hub("emre570/llama3.2-1b-tr-qlora")

What am I doing wrong?

Chahnwoo · October 4, 2024, 12:36am

What other models have you tried that yield similar results?

emre570 · October 8, 2024, 12:22pm

I tried Gemma 2B, 7B, Gemma 1.1 7B and Llama 3 8B. And I think it’s my mistake.

huggingzob · October 8, 2024, 3:26pm

Not an expert but I had similar problems with some other models and passing repetition_penalty to the generate function with a value larger than 1 helped.

Tum huggingface methodlari super kafa karistirici

madjeisah · October 17, 2024, 12:34pm

Use max_steps and train for about 2-5k steps.
Also, use repetition_penalty in your Text generation strategies
See: Text generation strategies (huggingface.co)

emre570 · December 23, 2024, 9:17am

Hi, sorry for late answer but I saw it and recently went curious. What’s the logic behind of it?

DawidN · March 26, 2025, 9:22am

I’m having a similar issue here. Setting repetition_penalty does in fact help with the model repeating itself, but instead of ending he just generates new text.

My issue is due to the fact that the model doesn’t stop, it just generates new text over and over again

trainer = SFTTrainer(
    model=model,
    train_dataset=ds["train"],
    peft_config=peft_config,
    args=SFTConfig(
        output_dir="models/email-tuning",
        num_train_epochs=2,
        per_device_train_batch_size=2,  
        per_device_eval_batch_size=2,
        learning_rate=2e-4,
        lr_scheduler_type="cosine",
        logging_steps=10,
        save_steps=10,
        save_strategy="steps",
        report_to="wandb",
        run_name="email-tuning-v3-llama-3.1-8b-instruct-2048",
        max_length=2048,
      
    ),
)

trainer.train()

This is my training loop, I’m using meta-llama/Llama-3.1-8B-Instruct to train + a dataset with messages and roles: system,assistant,user. Example:

chat = [
    {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]

John6666 · March 26, 2025, 11:37am

Hmm… How about max_new_tokens?

outputs = model.generate(
    **inputs,
    top_p=0.9,
    temperature=0.7,
    repetition_penalty=1.2,
    max_new_tokens=512,
    do_sample=True,
)

DawidN · March 28, 2025, 7:33am

Yeah I’ve tried that and now the model just stops mid sentence once at the max_new_tokens limit.

DawidN · March 28, 2025, 7:50am

Quite interesting thing happened here, I didn’t changed anything only the model to QWEN-2.5-7B-instruct from LLama-3.1-8b-instruct and the problem stopped. The training works fine now.

alexgre · July 16, 2025, 12:17am

I used qwen2.5-3b-inst and qwen2.5-0.5b-inst. I have the problem you had. I wonder did you do something special?

Topic		Replies	Views
How does the model generate the answer 1 time? Beginners	0	84	May 19, 2024
Issues when fine tuning Llama-3.2-11B-Vision Beginners	4	169	May 8, 2025
Repetitive Token Generation During Evaluation in Fine-Tuned LLaMA Model 🤗Transformers	1	47	March 6, 2025
When I try to use my fine-tuned Causal LM model to inference a prompt, I get nothing but the last word repeated multiple times 🤗Transformers	1	539	April 14, 2024
Model Tuning and Re-Tuning Problems Models	2	39	June 10, 2025

Repetitive Answers From Fine-Tuned LLM

Related topics