Finetuning for fp16 compatibility

sshleifer · September 3, 2020, 5:26pm

t5 and pegasus don’t really work in fp16 because they create activations that overflow fp16 bits. (they were trained in bfloat 16 which has larger range) Has anyone read/seen/heard anything about finetuning/scaling models so that their activations can fit in fp16. (or generally to encourage smaller magnitude activations?

I tried one experiment on google/pegasus-xsum where I finetune with summarization lm loss and add some additional losses based on the magnitude of hidden states, but I haven’t weighted them (the model instantly forgets how to summarize) so I’m looking around.

Igor · June 7, 2021, 10:05am

It’s been a long time since this post, but maybe you remember if the problem with fp16 will appear when training the models from scratch (pretraining)?

I’ve seen some NaNs already while training with fp16 on, but after lowering the learning rate, beginning of training looks reasonable.

Igor · June 17, 2021, 10:33am

After 3 days of training with fp16 on NaN loss happened. Created issue Pegasus pretraining in fp16 results in NaN loss · Issue #12225 · huggingface/transformers · GitHub, maybe someone knows how it can be fixed.

Topic		Replies	Views
T5 fp16 issue is fixed 🤗Transformers	18	15156	June 20, 2024
Mixed precision for bfloat16-pretrained models 🤗Transformers	2	12393	April 21, 2021
FP-16 training producing nans on t5-large/flan-t5-xl 🤗Transformers	0	712	June 1, 2023
Does it ever make sense to finetune w fp32 if the base model was trained w fp16? Intermediate	1	749	July 8, 2022
Finetune on Titan X Pascal 🤗Transformers	0	236	July 5, 2023

Finetuning for fp16 compatibility

Related topics