Finetuning Meta-Llama-3.1-8B using PEFT

I’ve been working on code to fine-tune (locally) the meta-llama/Meta-Llama-3.1-8B or meta-llama/Meta-Llama-3.1-8B-Instruct model that I downloaded in local server, using the example from the Hugging Face repository at huggingface/huggingface-llama-recipes. I made some adjustments to the code to fit my custom dataset.

While the training process seems to complete successfully, I’m encountering an issue during inference: the model generates gibberish responses, even for general questions.
Sample output

Question: What is the capital of France?
Answer: What is the capital of France?ders Шев麦 Roy Yoursacht_NCdersders Rakiset Roy

I’m not sure what might be wrong with the code or if I made any mistakes in the implementation.
Should I consider using a different model, or what additional steps can I take to ensure the fine-tuning works effectively?

Note: I downloaded the model by excluding the original checkpoint and only retrieved the main folder.

huggingface-cli download meta-llama/Meta-Llama-3.1-8B --exclude "original/*" --local-dir Meta-Llama-3.1-8B

This this updated code.

import torch
import json
from datasets import load_dataset, Dataset

from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments

def load_and_flatten_dataset(file_path):
    with open(file_path, "r") as f:
        raw_data = json.load(f)
    formatted_data = []
    for entry in raw_data:
        context = entry['context']
        for q in entry['questions']:
            formatted_data.append({
                "text": f"Context: {context}\nQuestion: {q['question']}\nAnswer: {q['answer']}"
            })

    return Dataset.from_dict({"text": [d['text'] for d in formatted_data]})

def tokenize_dataset(dataset, tokenizer):

    def tokenize_function(qa_data):
        inputs = tokenizer(
            qa_data['text'], 
            padding="max_length",  
            truncation=True,
            max_length=512,        
            return_tensors="pt"
        )

        return inputs

    return dataset.map(tokenize_function, batched=True)


dataset_file = "data/questions_answers.json"
model_path = "./Meta-Llama-3.1-8B"
dataset = load_and_flatten_dataset(dataset_file)
tokenizer = AutoTokenizer.from_pretrained(model_path)

if tokenizer.pad_token is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id
    tokenizer.pad_token = tokenizer.eos_token 

tokenized_datasets = tokenize_dataset(dataset, tokenizer)
#dataset = load_dataset("imdb", split="train")

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    logging_dir='./logs',
    logging_steps=10,
    fp16=True
)

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=5,
    per_device_train_batch_size=4,  
    logging_dir='./logs',
    logging_steps=10,
    gradient_accumulation_steps=4,  
    eval_strategy="epoch",
    save_strategy="epoch",
    fp16=True,  
    ddp_find_unused_parameters=False,  
    report_to="none",  
)

QLoRA = True
if QLoRA:
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_quant_type="nf4"
    )
    
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        quantization_config=quantization_config,
        device_map="auto"  
    )

    lora_config = LoraConfig(
        r=8,
        target_modules="all-linear",
        bias="none",
        task_type="CAUSAL_LM",
    )
else:
    model = AutoModelForCausalLM.from_pretrained(model_path)
    lora_config = None


trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
    train_dataset=tokenized_datasets,
    eval_dataset=tokenized_datasets,
    dataset_text_field="text",
)

trainer.train()


output_dir="./fine-tuned-llama"

model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

JSON file contains following sample data.

   {
        "context": "On 05 May 2019, Anne Johnson joined QandA Technologies. She was assigned Employee Number 1000 and the username gbuch. She was born on 12 May 1980. Her father's name is Peter Johnson and mother's name is Diana Johnson. She holds the position of Graphic Designer. For contact, her mobile number is 7344186426.",
        "questions": [
            {
                "question": "When did Anne Johnson join QandA Technologies?",
                "answer": "05 May 2019"
            },
            {
                "question": "What is Anne Johnson's employee number?",
                "answer": "1000"
            },
            {
                "question": "What is Anne Johnson's birthdate?",
                "answer": "12 May 1980"
            },
            {
                "question": "What is Anne Johnson's father's name?",
                "answer": "Peter Johnson"
            },
            {
                "question": "What is Anne Johnson's mother's name?",
                "answer": "Diana Johnson"
            },
            {
                "question": "What is Anne Johnson's job position?",
                "answer": "Graphic Designer"
            },
        ]
    },

There have been issues with the Llama 3.1 8B tokenizer, so I would check if the special tokens (namely the EOS token) have been correctly added to the tokenized dataset.

@Chahnwoo
I tried to add EOS token manually at the end of text line.

EOS_TOKEN = tokenizer.eos_token
"text": f"Context: {context}\nQuestion: {q['question']}\nAnswer: {q['answer']} + EOS_TOKEN"

But still it showed same result.

The formatted string you’ve shared here doens’t actually seem to add an eos_token to the end of a text. Instead, it seems to just be adding the raw text " + EOS_TOKEN". I would double check that first and foremost.

1 Like