Settings warmup_steps=1 dosn't fixed the learning rate , which change every epcohs

I’m training model with the following parameters:

Seq2SeqTrainingArguments(
    output_dir                   = "./out", 
    overwrite_output_dir         = True,
    do_train                     = True,
    do_eval                      = True,
    
    per_device_train_batch_size  = 2, 
    gradient_accumulation_steps  = 4,
    per_device_eval_batch_size   = 8, 
    
    learning_rate                = 1.25e-5,
    warmup_steps                 = 1,
    
    save_total_limit             = 1,
       
    evaluation_strategy          = "epoch",
    save_strategy                = "epoch",
    logging_strategy             = "epoch",  
    num_train_epochs             = 5,   
    
    gradient_checkpointing       = True,
    fp16                         = True,    
        
    predict_with_generate        = True,
    generation_max_length        = 225,
          
    report_to                    = ["tensorboard"],
    load_best_model_at_end       = True,
    metric_for_best_model        = "wer",
    greater_is_better            = False,
    push_to_hub                  = False,
)

After finished to trained, I’m looking on the file trainer_state.json, and it seems that the learning rate is not fixed.
Here are the values of learning_rate and step:

learning_rate, steps


1.0006 e-05       1033
7.5062 e-06       2066
5.0058 e-06       3099
2.5053 e-06       4132
7.2618 e-09       5165

It seems that the learning rate is not fixed on 1.25e-5 (after step 1)
What am I missing ?

Hi @laro1, try passing lr_scheduler_type="constant_with_warmup" to your training args. The default is "linear", which means it’ll decay linearly after warmup

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.