Hi,
I’m running into nan
training_loss
when training wav2vec2 xlsr with my custom dataset.
Weird thing is that even though training_loss
goes to nan
, eval_loss
still goes down, and error_rate (cer
and wer
) also goes down.
I’ve experimented with lower learning_rate, but still getting similar behavior. I’m logging with wandb
.
My graphs look like the following:
There’s no value for
train/loss
after ~60 steps since it is nan
, but eval/loss
is still decreasing.
Has anyone experienced similar behavior?