Does it ever make sense to finetune w fp32 if the base model was trained w fp16?

nadahlberg · July 6, 2022, 8:26pm

Is it possible that the model can make use of the added precision during finetuning? Or is it the case that if a model was initially trained with mixed precision then all downstream training should have use same (or less) precision?

MrKnowNothing · July 8, 2022, 5:34pm

Hi @nadahlberg transformer models are often sensitive to FP16 training because of the Layer Norms involved . The model can definitely have added precision benifits but that will not because that the model was trained on fp32 but because of transformers

Topic		Replies	Views
Mixed Precision training (fp16), how to use in production? 🤗Transformers	1	924	July 7, 2022
Finetuning for fp16 compatibility Research	2	1698	June 17, 2021
Can I use fp16 model for mixed precision training? 🤗Transformers	0	296	January 16, 2024
Does fp16 training compromise accuracy? Models	2	1198	May 17, 2022
Model pre-training precision database: fp16, fp32, bf16 🤗Transformers	4	7055	December 3, 2022

Does it ever make sense to finetune w fp32 if the base model was trained w fp16?

Related topics