How to use different learning rates when deepspeed enabled

BoltzmachineQ · June 14, 2025, 4:44am

This is my config, the learning rate is set to auto and is supposed to be initialized by training_args

{
  "zero_optimization": {
    "stage": 1,
    "allgather_partitions": true,
    "allgather_bucket_size": 1e9,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 1e9,
    "contiguous_gradients": true
  },
  "fp16": {
    "enabled": "auto",
    "auto_cast": true,
    "loss_scale": 0,
    "initial_scale_power": 32,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bf16": {
    "enabled": "auto"
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": [
        0.9,
        0.999
      ],
      "eps": 1e-8,
      "weight_decay": "auto"
    }
  },
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "steps_per_print": 2000,
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "wall_clock_breakdown": true
}

Just wonder how to achieve different learning rates for different groups of parameters.
How to set different learning rates for different parameters in the model? - #5 by Alanturner2 has some solutions but it does not exactly fit my scenario.

John6666 · June 14, 2025, 8:11am

In cases more complicated than that example, you may need to create your own scheduler.

Topic		Replies	Views
Learning rate with deepspeed is fixed despite lr set to auto DeepSpeed	2	2232	September 6, 2023
How to set different learning rates for different parameters in the model? Beginners	7	496	December 17, 2024
Learning rate setting 🤗Transformers	1	2083	November 16, 2020
Issues with using DeepSpeed on multiple GPUs DeepSpeed	2	2601	September 9, 2022
Optimizer got an empty parameter list when using deepspeed Beginners	0	895	October 29, 2021

How to use different learning rates when deepspeed enabled

Related topics