As bfloat16 hardware support is becoming more available there is an emerging trend of training in bfloat16, which leads to the issue of not being able to finetune such models in mixed precision (or eval in fp16) - be it amp, apex or deepspeed/fairscale. Last week I spent some time sitting with the …

Mixed precision for bfloat16-pretrained models

stas April 21, 2021, 11:37pm 3

We started compiling a wiki of how different models were pre-trained, please add your knowledge there - thanks!

1 Like

Topic		Replies	Views
Finetuning for fp16 compatibility Research	2	1697	June 17, 2021
Model pre-training precision database: fp16, fp32, bf16 🤗Transformers	4	7049	December 3, 2022
Training Loss = 0.0, Validation Loss = nan Intermediate	6	13839	September 5, 2023
Bfloat16 conversion results in significantly slower computation for various transformer models 🤗Transformers	0	1413	December 20, 2021
Question met when using DeepSpeed ZeRO3 AMP for code testing on simple pytorch examples 🤗Accelerate	0	32	July 24, 2024