Unable to train model (Loss is 0.000000)

banank1989 · May 30, 2023, 10:06am

I am trying to fine tune the LLM(OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5) with my data.

import torch
from transformers import LineByLineTextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", 
                                             load_in_8bit=True,
                                             device_map="auto")

from datasets import load_dataset

# Load the dataset
dataset = load_dataset('parquet', data_files='data/dataset.parquet')

# Tokenize and format the dataset
def tokenize_function(examples):
    return tokenizer(examples['TEXT'], truncation=True, max_length=128, padding='max_length')


tokenized_dataset = dataset.map(tokenize_function, batched=True)
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=100,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=4
)



data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=False,
)

# Create the Trainer and train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    data_collator=data_collator,
)

trainer.train()

# Save the trained model
trainer.save_model("model")  # replace with the path where you want to save the model
tokenizer.save_pretrained("model")

Now the issue is while training, loss is 0.000000 meaning there is something wrong with my training, Also when I am loading the trainied model, answers are not coming at all(Which should not be the case). Also the downloaded actual model disk size is 23GB but mine model size is 9.6GB

My raw data is in csv which I have then converted to parquet. My dataset has 3 columns(TEXT, source, metadata). Also my dataset only contains 12 rows

This is how I have generated parquet file

df = pd.read_csv('data/data.csv')

df.to_parquet("data/dataset.parquet", row_group_size=100, engine="pyarrow", index=False)

uttu · August 9, 2023, 6:39am

Hey Banak. I am also getting the same issue when I am trying to finetune a model with QLoRa. After 200 steps my loss is 0.000000. Any luck you had resolving this issue.?

nir-arora98 · October 17, 2023, 3:38pm

I got same problem with mistral

Topic		Replies	Views
Train modell for Question Answering Intermediate	3	306	May 6, 2024
Fine-tune transformers for language model Beginners	2	662	August 14, 2022
Errors when trying to fine-tune OpenLLaMA using Trainer API 🤗Transformers	1	369	December 26, 2024
How to properly wrap a model for training with accelerate? 🤗Accelerate	1	1284	September 20, 2023
Reproduce SFTTrainer with Accelerate and Pytorch 🤗Accelerate	0	28	May 18, 2025

Unable to train model (Loss is 0.000000)

Related topics