Model pre-training precision database: fp16, fp32, bf16

stas · April 21, 2021, 11:08pm

There is an emerging need to know how a given model was pre-trained: fp16, fp32, bf16. So one won’t try to use fp32-pretrained model in fp16 regime. And most recently we are bombarded with users attempting to use bf16-pretrained (bfloat16!) models under fp16, which is very problematic since fp16 and bf16 numerical ranges don’t overlap too well.

We are discussing adding a new field to models that will tell users how it was trained.

Some papers don’t disclose how a model was trained. So perhaps other ways can be found.

I made this into a wiki post, so if you can help to compile the knowledge base - and you know for sure which mode it belongs too and you can cite a reference - please add an entry below.

Notes:

Let’s focus on official models here - i.e. ones with papers (since some models have hundreds of derived/finetuned checkpoints)
One entry per model is enough unless you know that different checkpoints were trained with different precision which is for example the case with EleutherAI/gpt-neo checkpoints (one in bf16 and another in fp32)
Typically if a model was trained on TPU v2 or higher it’s almost sure it’s bfloat16.
We are looking for definitive data with references where it clearly states how the model was trained. If you are not sure then please don’t add anything.

Thank you!

This is a WIKI post so please add the data directly

Precision of Pre-trained Models (Wiki)

float16 (mixed precision)

allenai/longformer - paper, “we employed mixed precision training (floating points 16 and 32) using apex12 to reduce memory consumption and speed-up training. However, we kept the attention computation in fp32 to avoid numerical instability issues.”
allenai/led - same as allenai/longformer
lvwerra/codeparrot - informed by the creator of the model
facebook/m2m100_418M (and others) train info
eleutherai/gpt-neox-20b (doesn’t exist yet, but including for the sake of future-proofing) - as shown in the configs. The paper also states that the model was in fp16, see “Appendix B: Full Configuration Details.” Finally, Stella Biderman’s official announcement on Twitter also includes a link to download both “full” weights and “slim” weights which implies mixed precision was used.

bfloat16 (mixed precision)

google/mobilebert - paper, “we train IB-BERTLARGE on 256 TPU v3 chips”
eleutherai/gpt-neo-1.3b - shown in the config file
eleutherai/gpt-j-6b - shown in the GitHub readme
google/pegasus-cnn_dailymail - XXX: needs reference
google/pegasus-xsum - XXX: needs reference
google/mt5 - most likely same as t5
t5 - paper “TPU v3 chips”
bigscience/T0 and other T0* models (trained on TPUs, confirmed on bigscience slack)

float32 (full precision)

EleutherAI/gpt-neo-2.7B - the model’s config file doesn’t specify precision and the codebase defaults to fp32
gsarti/it5-base and other it5-* - stated by creator (JAX-trained)

Please keep your comments on topic, it should be easy to start a new thread if you have related questions/issues to discuss.

stellaathena · February 11, 2022, 7:25pm

I’ve updated the listings for EleutherAI models. I decided to include GPT-NeoX 20B in the list for the sake of future-proofing, and so I don’t need to come back and re-document it.

T5 is listed as being trained with mixed precision, but I had a conversation with Colin Raffel today that implies that it was trained in fp32. I will follow up and seek documentation.

stellaathena · February 23, 2022, 12:18am

This was a miscommunication - T5 was trained in mixed precision.

stas · February 23, 2022, 12:46am

Thank you for following through, Stella! That’s very helpful!

stas · December 3, 2022, 2:18am

Please keep your comments on topic, it should be easy to start a new thread if you have related questions/issues to discuss.

If possible please remove the unrelated comments from here and start new threads, as off-topic only invites more unrelated comments. Thank you

Topic		Replies	Views
Mixed precision for bfloat16-pretrained models 🤗Transformers	2	12531	April 21, 2021
Can I use fp16 model for mixed precision training? 🤗Transformers	0	300	January 16, 2024
Mixed Precision training (fp16), how to use in production? 🤗Transformers	1	932	July 7, 2022
Can we use mixed precision with all? (fp16 + fp32 + bf16) 🤗Transformers	0	274	December 1, 2022
Fine-Tuning / Pre-Training Tips 🤗Transformers	1	2974	August 5, 2022

Model pre-training precision database: fp16, fp32, bf16

Precision of Pre-trained Models (Wiki)

float16 (mixed precision)

bfloat16 (mixed precision)

float32 (full precision)

Related topics