Repetitive Answers From Fine-Tuned LLM

Hello people, I fine-tuned Llama 3.2-1B model with a Turkish dataset.

Sometimes it gives repetitive answers. Our dataset does not contain any repetitive lines. This problem occurs ever model that I fine-tuned. I am making mistakes somewhere but don’t know where. Here is the model and dataset first if you want to inspect:

You may not understand since it’s Turkish, but the dataset we combined is some kind of “synthetic” or “direct”. They are not so user-friendly. Maybe it’s related to dataset.

Here is code:

from huggingface_hub import notebook_login

# %%
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# %%
from peft import LoraConfig
from transformers import BitsAndBytesConfig

lora_config = LoraConfig(
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj"],
bnb_config = BitsAndBytesConfig(

# %%
from transformers import AutoTokenizer, AutoModelForCausalLM

modelName = "meta-llama/Llama-3.2-1B"

tokenizer = AutoTokenizer.from_pretrained(modelName)
model = AutoModelForCausalLM.from_pretrained(modelName, quantization_config=bnb_config, device_map="auto")

# %%
from datasets import load_dataset
dataset = load_dataset("myzens/alpaca-turkish-combined", split="train")
dataset, dataset[0]

# %%
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:

### Input:

### Response:

bos_token = tokenizer.bos_token
eos_token = tokenizer.eos_token

tokenizer.pad_token_id = 128002
pad_token = tokenizer.pad_token

# %%
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = bos_token + alpaca_prompt.format(instruction, input, output) + eos_token
    return { "text" : texts, }

# %%
dataset =, batched = True)

# %%

# %%
from transformers import TrainingArguments

train_args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        #max_steps = 150,
        num_train_epochs = 1,
        gradient_checkpointing = True,
        learning_rate = 2e-4,
        bf16 = True,
        logging_steps = 250,
        optim = "adamw_hf",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        output_dir = "llama3.2-1b-tr",

# %%
from trl import SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    args = train_args,
    peft_config = lora_config,
    train_dataset = dataset,
    dataset_text_field = "text",
    packing = False,

# %%

What am I doing wrong?

1 Like

What other models have you tried that yield similar results?

I tried Gemma 2B, 7B, Gemma 1.1 7B and Llama 3 8B. And I think it’s my mistake.

Not an expert but I had similar problems with some other models and passing repetition_penalty to the generate function with a value larger than 1 helped.

Tum huggingface methodlari super kafa karistirici :slight_smile:


Use max_steps and train for about 2-5k steps.
Also, use repetition_penalty in your Text generation strategies
See: Text generation strategies (

1 Like