Setting optimizer parameters with DeepSpeed

I’ve been using DeepSpeed with accelerate to launch the Transformers standard trainer without specifying a json config file for DeepSpeed. I just noticed this snipped in the logs:

[INFO|deepspeed.py:303] 2024-01-22 12:27:21,324 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/norman/.cache/torch_extensions/py311_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Using /home/norman/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.933143138885498 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.970329523086548 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.9743385314941406 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.976152181625366 seconds
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000020, betas=(0.900000, 0.999000), weight_decay=0.010000, adam_w=1

Even though I’ve set weight decay to 0.0 through the command line (i.e. accelerate launch ... --weight_decay 0.1).

Are the logs accurate? Is the weight decay value used in the optimizer 0.01 or 0.0? Do I need to set the weight decay through an explicit json config for DeepSpeed?