Training Arguments to do pure bf16 training?

I am trying to finetune some large language models and I was wondering if there is a configuration to do pure bf16 training i.e. parameters, optimizer, gradients all in bf16?

I already know about --bf16 True which uses torch AMP, but I don’t want to use mixed precision at all if possible.
I know there are certain risks involved with stability but getting rid of mixed precision will help reduce memory footprint and everything in bf16 will keep things fast.

I am currently exploring how Lightning does this with their bf16-true mode to replicate it with trainer.