I resumed training from checkpoint. I set the learning rate in TrainingArguments to 5e-5. Now the learning rate in the first logging step is 2.38e-05. Its value decreases in subsequent steps. How can I set the learning rate to the desired value? I do not understand where this 2.38e-05 comes from.
These are my training arguments.
training_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
num_train_epochs=8,
max_steps=-1,
evaluation_strategy='epoch',
eval_steps=0,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=5e-5,
warmup_ratio=0.1,
warmup_steps=0,
logging_dir=None,
logging_strategy='steps',
logging_steps=50,
disable_tqdm=disable_tqdm,
save_strategy='epoch',
save_steps=0,
load_best_model_at_end=True,
metric_for_best_model='eval_loss',
seed=random_state,
predict_with_generate=True,
dataloader_num_workers=4,
save_total_limit=10,
)
sgugger
November 23, 2021, 12:58pm
2
The scheduler used by default is a linear decay, so that’s why you see this learning rate, since you’re logging after 50 steps.
Here I have the first few logs every 50 steps. The learning rate value doesn’t vary much between these steps, so I assume the first step wasn’t with a value of 5e-5.
{'loss': 0.724, 'learning_rate': 2.3809441323160455e-05, 'epoch': 8.0}
{'loss': 0.5776, 'learning_rate': 2.3809028891343685e-05, 'epoch': 8.0}
{'loss': 0.6006, 'learning_rate': 2.3808616459526912e-05, 'epoch': 8.0}
{'loss': 0.6058, 'learning_rate': 2.3808204027710142e-05, 'epoch': 8.0}
{'loss': 0.5938, 'learning_rate': 2.3807791595893365e-05, 'epoch': 8.0}
{'loss': 0.6377, 'learning_rate': 2.3807379164076595e-05, 'epoch': 8.0}
{'loss': 0.5863, 'learning_rate': 2.3806966732259825e-05, 'epoch': 8.0}
{'loss': 0.5971, 'learning_rate': 2.380655430044305e-05, 'epoch': 8.0}
{'loss': 0.6842, 'learning_rate': 2.380614186862628e-05, 'epoch': 8.0}
{'loss': 0.6386, 'learning_rate': 2.3805729436809508e-05, 'epoch': 8.0}
{'loss': 0.6297, 'learning_rate': 2.3805317004992734e-05, 'epoch': 8.0}
{'loss': 0.6817, 'learning_rate': 2.380490457317596e-05, 'epoch': 8.0}
1 Like
f3n1Xx
November 23, 2021, 4:56pm
4
I am also having the same issue. There is a lot of confusion regarding the metrics.