Does fp16 training compromise accuracy?
If not, why isn’t it enabled by default?
Mixed precision training (fp16) is only possible on certain hardware and in some cases results in training instability depending on if the model was pre-trained using bfloat16.
For older GPUs (before Volta/Turing), fp16 provides no speed up and will require more memory because both the fp16 values and fp32 values will be stored in memory. I’d recommend reading this: Performance and Scalability: How To Fit a Bigger Model and Train It Faster