During training I have the loss from the training itself and I have custom callback that calculates the loss every epoch. But I see the different values there:
{‘loss’: 0.5845, ‘learning_rate’: 9.318637274549099e-05, ‘epoch’: 3.5}
{‘loss’: 0.5847, ‘learning_rate’: 9.268537074148296e-05, ‘epoch’: 3.75}
{‘loss’: 0.5779, ‘learning_rate’: 9.218436873747496e-05, ‘epoch’: 4.0}
{‘train_loss’: 0.7955134510993958, ‘train_accuracy’: 0.58578125, ← calculated by me
This is the callback that calculates the loss:
class TrainDataEvalCallback(TrainerCallback):
def __init__(self, trainer) -> None:
super().__init__()
self._trainer = trainer
self._steps_evaluate_training = 100
def on_epoch_end(self, args, state, control, **kwargs):
if control.should_evaluate:
self._trainer.model.eval()
control_copy = deepcopy(control)
training_steps_number = self._trainer.train_dataset._steps_number
self._trainer.train_dataset._steps_number = self._steps_evaluate_training
self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
self._trainer.train_dataset._steps_number = training_steps_number
self._trainer.model.train()
return control_copy
Any idea why the numbers are so different ? It’s happens all over the epochs I get greater numbers somehow