I’ve been playing around with the XLSR-53 fine-tuning functionality but I keep getting nan training loss.
Audio files I’m using are:
Down-sampled to 16kHz
Set to one channel only
Vary in length between 4 to 10s
I’ve set the following hyper-params:
attention_dropout=0.1
hidden_dropout=0.1
feat_proj_dropout=0.0
mask_time_prob=0.05
layerdrop=0.1
learning rate:
on a warmup schedule to 3e-4 for 3 epochs
at 5e-4 for 3 epochs
back to 3e-4
Sadly, I’m fine-tuning the model on an unpublished corpus, so I am probably not at liberty to upload it here which might hinder reproducibility efforts greatly.
Here’s what the loss and WER progression looks like:
Anyone know what could be happening here? The model seems to be training just fine and some testing proves that the model performs well on the language I’m training it on. So what’s up with the training loss?
I’m sorry without a google colab it will be difficult to debug this for us. Given that your WER seems to decrease nicely - there might just be a problem at displaying the values…let’s see whether other people encounter the same problem
However this doesn’t mean that the losses after nan is displayed are actually useless → the model can very well train. So it’s more of a display error than an actual error often times. All in all my best suggestion here is to just take a look at the validation loss and if it goes down smoothly continue training
Someone suggested adding this parameter in hopes of getting rid of this problem:
ctc_zero_infinity=True
Loss is gonna be gigantic and it does hold that every time I faced this issue, the first training loss was Inf so this is probably a good fix for the issue!
i have same problem but also i have eval_wer is 1.0, at the beginning of training eval_wer is 0.6 and 0.5 and after 19 ephocs the eval_wer is 1.0 and still 1.0 in ephoc 33