I have trained a model and saved it, tokenizer as well. During the training I set the load_best_checkpoint_at_end to True and can see the test results, which are good
Now I have another file where I load the model and observe results on test data set. I want to be able to do this without training over and over again. But the test results in the second file where I load the model are worse than the ones right after training.
Is there a way to load the model with best validation checkpoint ?
Yes but I do not know apriori which checkpoint is the best. I trained the model on another file and saved some of the checkpoints. Yes, I can track down the best checkpoint in the first file but it is not an optimal solution.
I believe that an ideal solution would be to only save the best checkpoint, or overwrite the existing checkpoint when model improves so that in the end I only have one model actually.
Is what I want more clear now?
Is there a way to save only the best checkpoint instead of many?
I don’t understand the question. With load_best_model_at_end the model loaded at the end of training is the one that had the best performance on your validation set. So when you save that model, you have the best model on this validation set.
If it’s crap on another set, it means your validation set was not representative of the performance you wanted and there is nothing we can do on Trainer to fix that.
I understand. This means that the last saved checkpoint is the checkpoint with the best validation score instead of the final weights if I am saving multiple checkpoints.
import os
from transformers.trainer_callback import TrainerState
save_dir = "your_trainer_save_directory"
ckpt_dirs = os.listdir(save_dir)
ckpt_dirs = sorted(ckpt_dirs, key=lambda x: int(x.split('-')[1])
last_ckpt = ckpt_dirs[-1]
state = TrainerState.load_from_json(f"{save_dir}/{last_ckpt}/trainer_state.json")
print(state.best_model_checkpoint) # your best ckpoint.
If the above code breaks or doesnt work because of an API change/versioning, you could try tracing the github code starting from the equivalent of training_args.load_best_model_at_end to see how the best model checkpoint directory is called.
The parameter save_total_limit of the TrainingArguments object can be set to 1 in order to save only the best checkpoint.
Note that the documentation says that when the best checkout and the last one are different from each other, both could be kept at the end. However, I have not seen this scenario so far.