Learning rate with deepspeed is fixed despite lr set to auto

narai · August 25, 2023, 12:00am

I’m trying to fine-tune a llama model with an adapted learning rate, but the learning rate is being reported as a fixed 5e-05 at every single step. Given that the deepspeed config has “lr”: “auto”, why isn’t the learning rate changing? The eval loss is constantly improving at every evaluation step, but very slowly.

I’m running my code like this:
deepspeed train_script.py

Relevant parts of code:

training_arguments = transformers.TrainingArguments(
bf16=True,
num_train_epochs = NUM_EPOCHS,
logging_strategy = ‘steps’,
logging_steps=10,
evaluation_strategy=“steps”,
save_strategy=“steps”,
eval_steps=10,
save_steps=20,
output_dir=OUTPUT_DIR,
save_total_limit=3,
deepspeed=“ds_config_zero3_offload_param_offload_optimizer.json”, # args.deepspeed_config
)

trainer = transformers.Trainer(
model=peft_model,
train_dataset=train_data,
eval_dataset=val_data,
args=training_arguments,
data_collator=data_collator,
callbacks=[SavePeftModelCallback],
)

Contents of ds_config_zero3_offload_param_offload_optimizer.json :
{
“fp16”: {
“enabled”: “auto”,
“loss_scale”: 0,
“loss_scale_window”: 1000,
“initial_scale_power”: 16,
“hysteresis”: 2,
“min_loss_scale”: 1
},

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},

"scheduler": {
    "type": "WarmupLR",
    "params": {
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},

"zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 20,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false

}

dblakely · August 25, 2023, 2:04am

The learning rate being “auto” in the deepspeed config just means it’ll use your huggingface training args to set the learning rate, not automatically adjust the learning rate during training.

The Huggingface LR default is 5e-5, which is why you’re seeing that value. And the LR is a constant because the min warmup LR and max warmup LR in your deepspeed scheduler config are both defaulting to 5e-5.

To increase the learning rate, you should pass a larger value to your Huggingface training args. If you want the learning to be something more exotic, you can set it to one of the options here.

jbarry · September 6, 2023, 10:47pm

If you want to use WarmupLR, specifying the warmup_min_lr to 0 will increasse the learning rate from 0 to the learning rate specified elsewhere in your training arguments. After that, the learning rate will remain constant. If you want to decay the learning rate after reaching the peak learning rate, then you can use WarmupDecayLR like below:

    "scheduler": {
      "type": "WarmupDecayLR",
        "params": {
          "warmup_min_lr": 0,
          "warmup_max_lr": "auto",
          "warmup_num_steps": "auto",
          "warmup_type": "linear",
          "total_num_steps": "auto"
        }

Topic		Replies	Views
How to use different learning rates when deepspeed enabled DeepSpeed	1	23	June 14, 2025
How to check or manually control the learning rate used in training? 🤗Transformers	1	8037	May 6, 2022
Question about using trainer with DeepSpeed 🤗Transformers	0	451	April 25, 2023
[Deepspeed] ZeRO-Infinity integration released and config changes DeepSpeed	2	2295	April 28, 2021
Settings warmup_steps=1 dosn't fixed the learning rate , which change every epcohs Beginners	2	184	January 10, 2024

Learning rate with deepspeed is fixed despite lr set to auto

Related topics