Mixed precision for bfloat16-pretrained models

We started compiling a wiki of how different models were pre-trained, please add your knowledge there - thanks!

1 Like