In mixed precision paper, I should use fp32 base model for mixed precision training.
But many models saved by fp16 or bf 16. In this case what is the best format for mixed precision training?
Can I use fp16 model for mixed precision training? Or Do I have to convert fp16 model to fp32 model?