`nan` training loss but eval loss does improve over time

jjdv · March 17, 2021, 7:53pm

I’ve been playing around with the XLSR-53 fine-tuning functionality but I keep getting nan training loss.

Audio files I’m using are:

Down-sampled to 16kHz
Set to one channel only
Vary in length between 4 to 10s

I’ve set the following hyper-params:

attention_dropout=0.1
hidden_dropout=0.1
feat_proj_dropout=0.0
mask_time_prob=0.05
layerdrop=0.1
learning rate:
- on a warmup schedule to 3e-4 for 3 epochs
- at 5e-4 for 3 epochs
- back to 3e-4

Sadly, I’m fine-tuning the model on an unpublished corpus, so I am probably not at liberty to upload it here which might hinder reproducibility efforts greatly.

Here’s what the loss and WER progression looks like:

Anyone know what could be happening here? The model seems to be training just fine and some testing proves that the model performs well on the language I’m training it on. So what’s up with the training loss?

Pinging @patrickvonplaten and @valhalla as this might be relevant to them.

patrickvonplaten · March 18, 2021, 6:59am

Hey @jjdv,

I’m sorry without a google colab it will be difficult to debug this for us. Given that your WER seems to decrease nicely - there might just be a problem at displaying the values…let’s see whether other people encounter the same problem

jjdv · March 18, 2021, 4:41pm

hey @patrickvonplaten!

I forgot to attach the notebook to my post. (I’m not fine-tuning on colab so feel free to just import the notebook there).

Again, not sure how useful it would be since the data isn’t available publicly (yet!)

Here’s the notebook!

patrickvonplaten · March 21, 2021, 9:04pm

I looked a bit into it and the problem is the following:

If one loss becomes nan or inf all the following displayed losses also become nan or inf since the shown loss is the average of all losses seen so far, see: transformers/trainer.py at 82b8d8c7b02562695f88be81cf0993972e324874 · huggingface/transformers · GitHub

However this doesn’t mean that the losses after nan is displayed are actually useless → the model can very well train. So it’s more of a display error than an actual error often times. All in all my best suggestion here is to just take a look at the validation loss and if it goes down smoothly continue training

jjdv · March 23, 2021, 7:43pm

Someone suggested adding this parameter in hopes of getting rid of this problem:

ctc_zero_infinity=True

Loss is gonna be gigantic and it does hold that every time I faced this issue, the first training loss was Inf so this is probably a good fix for the issue!

kotb98 · October 10, 2022, 10:43am

i have same problem but also i have eval_wer is 1.0, at the beginning of training eval_wer is 0.6 and 0.5 and after 19 ephocs the eval_wer is 1.0 and still 1.0 in ephoc 33

Topic		Replies	Views
Wav2vec2 xlsr nan train loss Models	1	1007	June 14, 2021
Training and evaluation loss goes down however, WER score stays the same 🤗Transformers	0	368	May 23, 2022
While finetuning w2v-bert, the WER is not decreasing Beginners	4	117	June 6, 2025
Wav2Vec2: How to correct for nan in training and validation loss Models	13	9821	October 22, 2023
Fine-tuning Wav2v2.0: Loss increasing, WER decreasing Models	2	578	June 30, 2023

`nan` training loss but eval loss does improve over time

Related topics