Setting optimizer parameters with DeepSpeed

normster · January 22, 2024, 8:29pm

I’ve been using DeepSpeed with accelerate to launch the Transformers standard trainer without specifying a json config file for DeepSpeed. I just noticed this snipped in the logs:

[INFO|deepspeed.py:303] 2024-01-22 12:27:21,324 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/norman/.cache/torch_extensions/py311_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.933143138885498 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.970329523086548 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.9743385314941406 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.976152181625366 seconds
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000020, betas=(0.900000, 0.999000), weight_decay=0.010000, adam_w=1

Even though I’ve set weight decay to 0.0 through the command line (i.e. accelerate launch ... --weight_decay 0.1).

Are the logs accurate? Is the weight decay value used in the optimizer 0.01 or 0.0? Do I need to set the weight decay through an explicit json config for DeepSpeed?

Topic		Replies	Views
How DeepSpeed interacts with Trainer optimizer DeepSpeed	1	1212	October 13, 2021
Optimizer got an empty parameter list when using deepspeed Beginners	0	895	October 29, 2021
[Deepspeed] ZeRO-Infinity integration released and config changes DeepSpeed	2	2317	April 28, 2021
Besides writing your own training loop, is there any other advantage for using it with deepspeed? 🤗Accelerate	2	605	July 4, 2023
What arguments need to be changed when using deepeed in trainer? 🤗Transformers	2	472	July 3, 2021

Setting optimizer parameters with DeepSpeed

Related topics