Does fp16 training compromise accuracy?

Mixed precision training (fp16) is only possible on certain hardware and in some cases results in training instability depending on if the model was pre-trained using bfloat16.

For older GPUs (before Volta/Turing), fp16 provides no speed up and will require more memory because both the fp16 values and fp32 values will be stored in memory. I’d recommend reading this: Performance and Scalability: How To Fit a Bigger Model and Train It Faster

1 Like