Plateau in Eval Loss after 100 steps in DPO Training

ironrock · March 17, 2024, 2:19am

I’ve recently expanded my dataset to include a broader and higher quality of data for my DPO training of an LLM. Despite this enhancement, I’m encountering a consistent flattening in the eval loss curve after exactly 100 steps, every single time, with no improvement in model performance thereafter. The same thing happens with all the evaluation metrics, including rewards metrics. I’ve tried various DPO Beta and learning rate adjustments, but the issue persists. The plateau occurs despite the richer dataset, which, to my understanding, should ideally lead to better learning and performance.

Has anyone else faced a similar issue when introducing a more diverse dataset? Could there be a hidden factor in the training dynamics that might be causing this? Any insights or suggestions for troubleshooting this would be greatly appreciated.

Thank you!

Topic		Replies	Views
Practical Exercise: GRPO with Unsloth reward curve Course	1	199	April 1, 2025
The loss plateau of pratraining Bert using run_mlm.py Models	4	1942	April 4, 2023
Eval Loss spike Seq2seq Trainer Resume from Checkpoint 🤗Transformers	0	520	June 22, 2021
Loss Issues on Finetuning Beginners	0	314	February 22, 2024
DPO Training ruins my model’s conversational coherence Intermediate	1	25	June 26, 2025

Plateau in Eval Loss after 100 steps in DPO Training

Related topics