How to train huggingface model with fp16?

Betacat · May 14, 2022, 12:00pm

Hi I am using pytorch and huggingface to train my roberta-base to RTE dataset.
But I find that just using torch.half to my model will cause nan after first backward.
Is there any way to train my model with fp16 and without using huggingface’s Trainer function?

dblakely · May 23, 2022, 11:08pm

Yes, you can do this without using the Huggingface trainer. You’ll have to directly use the PyTorch AMP feature though (and this is what the HF trainer does as well, if you specify “amp” for the half_precision_backend). You’d want to get started by reading this page.

Topic		Replies	Views
How to create a half precision pretrained model Beginners	0	473	May 13, 2022
MobileBERT decoder returns nans when using fp16 (amp) Models	0	656	April 19, 2021
Does using FP16 help accelerate generation? (HuggingFace BART) 🤗Transformers	2	5755	September 30, 2020
LongFormer - fp16 training without Trainer Models	1	1096	April 27, 2022
Mixed Precision training (fp16), how to use in production? 🤗Transformers	1	932	July 7, 2022

How to train huggingface model with fp16?

Related topics