MobileBERT decoder returns nans when using fp16 (amp)

tblattner · April 19, 2021, 10:07pm

When finetuning MobileBERT for token classification, if I try to use FP16 (AMP), then the encoder inside of mobileBERT’s forward call will return nan values, eventually resulting in the loss also becoming nan. This does not occur when using FP32 mode. It also happens when passing the first mini-batch in. (observed through a debugger)

I was able to reproduce this, and reported on it on the github issues, using the run_ner sample. run_ner.py example MobileBERT FP16 returns nan loss · Issue #11327 · huggingface/transformers · GitHub

@lysandre

Topic		Replies	Views
ModernBERT MaskedLM nan training loss Models	7	564	January 27, 2025
Abnormal large value of MobileBert's <cls> embed 🤗Transformers	0	123	November 1, 2023
T5 fp16 issue is fixed 🤗Transformers	18	15101	June 20, 2024
Next sentence prediction with google/mobilebert-uncased producing massive, near-identical logits > 10^8 for its documentation example (and >2k others tried) 🤗Hub	1	818	October 19, 2021
FP-16 training producing nans on t5-large/flan-t5-xl 🤗Transformers	0	704	June 1, 2023

MobileBERT decoder returns nans when using fp16 (amp)

Related topics