Continue fine-tuning with Trainer() after completing the initial training process

Hey all,

Let’s say I’ve fine-tuned a model after loading it using from_pretrained() for 40 epochs. After looking at my resulting plots, I can see that there’s still some room for improvement, and perhaps I could train it for a few more epochs.

I realize that in order to continue training, I have to use the code trainer.train(path_to_checkpoint). However, I don’t know how to specify the new number of epochs that I want it to continue training for. Because it’s already finished the 40 epochs I initially instructed it to train for.

Do I have to define a new trainer? But if I define a new trainer, can I also change the learning rate? In addition to these questions, there is also the learning rate scheduler. The default of the trainer is the OneCycleLR, if I’m not mistaken. This means that by the end of my 40 previous epochs, the learning rate was 0. By restarting the training process, will the whole scheduler restart as well?

Thanks for any help in advance.

Yes, you will need to restart a new training with new training arguments, since you are not resuming from a checkpoint.
The Trainer uses a linear decay by default, not the 1cycle policy, so you learning rate did end up at 0 at the end of the first training, and will restart at the value you set in your new training arguments.

Hi, thanks for answering.

Ah, so it’s more like restarting the training but from a check-point, not actually continuing entirely from where you left it, at least considering the learning rate’s values. I suppose I could continue the training by setting a very low learning rate, to approximate the values it would have were it to continue training normally from epoch 40. Regarding the scheduler, you are right, it is the linear decay but I think it also has optional warm-up steps, in which case it becomes OneCycleLR.

Also, considering the argument output_dir of the new TraininbArguments objects I will define to restart training: do I pass any path I want there? I can use a different path from the previous one, or the old one. If I use the old one, is it usual to use overwrite_output_dir=True so that it overwrites the old checkpoint?

Thanks in advance.

It depends what you want, but you can re-use the same output_dir if you don’t mind overwriting your old checkpoints.

1 Like

Alright, thank you. Have a nice day

By the way, @sgugger, in my case where I don’t want to actually continue training because the 40 epochs are completed, do I still pass a checkpoint path in the trainer or do I just train by just writing trainer.train()?

You should instantiate your model from the version trained, then launch trainer.train().

1 Like

Great, I’ll just load it using from_pretrained() and train it using new TrainingArguments. Thank you :slight_smile:

1 Like