Hello,
I want to continue training a pretrained model. The model was trained until some point but took too long to run (8h per epoch) and it has to be finished. But we realized that the loss curve is likely to keep decreasing, so we decided to keep training from the last saved checkpoint.
We are using the AutoModelForMaskedLM model, with an initial learning rate of 1e-4 and lr_scheduler_type=âlinearâ.
Seems that the learning rate decreases along the epochs (right? I cant find in the tutorials and in the documentation the exact equation it is used to setting the learning rate along the epochs)
The losses for the last epochs in the loaded model were slowing decreasing below 0.55, and got at 0.546 when the model was saved.
However, when I started training, the loss went up to 0.6 after the first training epoch. I empirically tested a learning rate of 1e-6 and the loss went to 0.5454, an expected value.
So, I want to know if it is possible to get the values of the learning rates for each epoch the model was saved (it is saved anywhere in the checkpoint files?). Or at least log/print the learning rate in each training epoch. How to do that?
Is the learning rate restarting and I am losing all the progress the linear learning rate scheduler is calculating?
Also, what should I do to continue training with exactly the same learning rate as the original training had never stopped?
Seems that I have to pass the return of the function
transformers.get_linear_schedule_with_warmup()
to something in the trainer, but in order to do so, I need to get the optimizer from the trainer (another thing I donât know how to do). Any ideas on how to do that?
Lastly, suppose I want to set a learning rate for each epoch, how to communicate it to the optimizer/trainer to use in trainer.train()? In other words, how to manually set the learning rate?
Thank you.
The code to load the is something like this (here I do not show all the training arguments, only the relevant for the question):
from transformers import Trainer, TrainingArguments, AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
training_args = TrainingArguments(
learning_rate=1e-4,
lr_scheduler_type='linear',
warmup_steps = 0
warmup_ratio = 0.1
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=lm_datasets["train"],
eval_dataset=lm_datasets["validation"],
tokenizer=tokenizer
)
trainer.train(from_pretrained=model_checkpoint)