I am trying to finetune some large language models and I was wondering if there is a configuration to do pure bf16 training i.e. parameters, optimizer, gradients all in bf16?
I already know about --bf16 True
which uses torch AMP, but I don’t want to use mixed precision at all if possible.
I know there are certain risks involved with stability but getting rid of mixed precision will help reduce memory footprint and everything in bf16 will keep things fast.
I am currently exploring how Lightning does this with their bf16-true
mode to replicate it with trainer.