Why different num_train_epochs give different results?

This-username-does-n · August 16, 2023, 5:31pm

I am running the sample code [Text classification](https://Text classification)

When I set num_train_epochs to 5 the loss is:

Step	Training Loss
500	0.320200
1000	0.246800
1500	0.230600
2000	0.171200
2500	0.160800
3000	0.152400
3500	0.102400
4000	0.085700
4500	0.098600
5000	0.066400
5500	0.050800
6000	0.045400
6500	0.033500
7000	0.030500
7500	0.030600

But when I set num_train_epochs to 1 the loss is:

Step	Training Loss
500	0.338900
1000	0.242900
1500	0.212500

If I am running again with num_train_epochs to 1 I am getting the exactly same loss:

Step	Training Loss
500	0.338900
1000	0.242900
1500	0.212500

So my questions is: Why different num_train_epochs gives different loss? How to make it consistent? e.g., how to make that on num_train_epochs = 1 I will receive the same 3 losses as in num_train_epochs = 5

dblakely · August 17, 2023, 7:09pm

What learning rate schedule are you using? Some LR schedules (eg cosine) decay the learning rate based on the total number of training steps. Eg it decays faster if you train for fewer epochs.

Topic		Replies	Views
Different loss values during training Beginners	0	218	September 19, 2023
Different intermediate results given different number of epochs Beginners	0	136	December 20, 2023
Accuracy is stagnant Course	2	867	September 13, 2021
Trainer only doing 3 epochs no matter the TrainingArguments! Beginners	5	15264	June 20, 2022
Different models when loading checkpoint (run_mlm) 🤗Transformers	2	507	February 24, 2021

Why different num_train_epochs give different results?

Related topics