I’m training model with the following parameters:
Seq2SeqTrainingArguments(
output_dir = "./out",
overwrite_output_dir = True,
do_train = True,
do_eval = True,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
per_device_eval_batch_size = 8,
learning_rate = 1.25e-5,
warmup_steps = 1,
save_total_limit = 1,
evaluation_strategy = "epoch",
save_strategy = "epoch",
logging_strategy = "epoch",
num_train_epochs = 5,
gradient_checkpointing = True,
fp16 = True,
predict_with_generate = True,
generation_max_length = 225,
report_to = ["tensorboard"],
load_best_model_at_end = True,
metric_for_best_model = "wer",
greater_is_better = False,
push_to_hub = False,
)
After finished to trained, I’m looking on the file trainer_state.json
, and it seems that the learning rate is not fixed.
Here are the values of learning_rate and step:
learning_rate, steps
1.0006 e-05 1033
7.5062 e-06 2066
5.0058 e-06 3099
2.5053 e-06 4132
7.2618 e-09 5165
It seems that the learning rate is not fixed on 1.25e-5 (after step 1)
What am I missing ?