HF accelerate DeepSpeed plugin does not use custom optimizer or scheduler

Hello,

I am trying to launch the training of a large model in multi-node/multi-gpu setting with “accelerate” using DeepSpeed plugin (no DS config file) with 8-bit adam and LR cosine annealing scheduler. Yet, deepspeed doesn’t seem to use the 8-bit adam from BnB set in my python script but rather regular AdamW, while the documentation seems to indicate that this should work for custom optimizer/scheduler… Any idea what’s happening here? Is there a specific setup for this?

thanks

1 Like

looks like there is an implementation with the trainer by setting the training argument optim="adam_bnb_8bit" and this way it works … Not sure why the custom instantiation is not working …

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.