Unexpected behavior of load_best_model_at_end in Trainer (or am I doing it wrong?)

For me the trainer doesn’t load the best model in the end but the latest instead. I set load_best_model_at_end=True and also tried specifiying metric_for_best_model="eval_loss" and greater_is_better=False. Anybody experiencing the same? I assume it’s the newest instead of the the best model by running trainer.evaluate() after training and seeing that it’s not the lowest eval_loss. I am using the newest transformers version. Thank you for your help!

This is my code:

    trainer = Trainer(model=model,
                      args=training_args,
                      data_collator=data_collator,
                      train_dataset=tokenized_dataset["train"],
                      eval_dataset=tokenized_dataset["test"],
                      compute_metrics=compute_metrics,
                      callbacks=[early_stopping_callback, csv_logger_callback],
                      preprocess_logits_for_metrics=preprocess_logits_for_metrics)

    trainer.train()
    eval_results = trainer.evaluate()
    logging.info("Final evaluation results on validation set are:\n" + json.dumps(eval_results, indent=2))

And this is my training_args:

training_arguments:
load_best_model_at_end: True
metric_for_best_model: “eval_loss”
greater_is_better: False
max_steps: 100000
per_device_train_batch_size: 2048
per_device_eval_batch_size: 2048
optim: “schedule_free_adamw”
lr_scheduler_type: “constant”
learning_rate: 0.001
weight_decay: 0.00001
fp16: True
eval_strategy: “steps”
save_strategy: “steps”
eval_steps: 500
save_steps: 500
dataloader_num_workers: 32
dataloader_pin_memory: True
warmup_steps: 1000
tf32: True
torch_compile: True
torch_compile_backend: “inductor’”
eval_on_start: True
eval_accumulation_steps: 8
save_total_limit: 2
gradient_accumulation_steps: 1

1 Like

Never mind, the issue was simply that I didn’t employ a deterministic evaluation loop (because of random masking). Consequently, it selects the best model, but I don’t necessarily obtain the lowest loss when calling trainer.evaluate().

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.