Why do I get `epoch = 1` at the end of each training?

gsakkas · January 25, 2024, 3:34am

Continuing off this topic (@dblakely thank you for your answers).

When I set max_steps high enough that multiple epochs of training (or more precisely finetuning starcoderbase-3b) should occur, at the end I get reported only 1 epoch. And it doesn’t seem that it’s this logger that prints out anything. Maybe because I am using wandb (as it was used in the finetune script from the starcoder repo?)

The output at the very end of the training is this

{'loss': 0.0569, 'learning_rate': 0.0, 'epoch': 1.0}
{'eval_loss': 0.49919337034225464, 'eval_runtime': 40.2624, 'eval_samples_per_second': 1.267, 'eval_steps_per_second': 1.267, 'epoch': 1.0}
{'train_runtime': 328146.1378, 'train_samples_per_second': 0.244, 'train_steps_per_second': 0.061, 'train_loss': 0.2867981531023979, 'epoch': 1.0}

followed by a summary from wandb. As per the previous post, I should get more than 10 epochs.

Topic		Replies	Views
Trainer epoch does not go through all training data? Beginners	4	3787	January 22, 2021
How does `max_steps` affect the number of samples the model "sees"? Beginners	4	3751	January 19, 2024
Wandb plot x-axis epoch instead of global steps? Beginners	2	1459	November 30, 2023
Why different num_train_epochs give different results? Beginners	1	274	August 17, 2023
Trainer's step loss always drops sharply after each epoch regardless of model / data 🤗Transformers	3	2164	March 28, 2023

Why do I get `epoch = 1` at the end of each training?

Related topics