Experiencing sudden increase in loss when fine-tuning Wav2vec (see picture attached).
We’ve seen a similar increase in loss due to the quality of data in the past (e.g. multiple white spaces between words or inaccurate transcription) - but this doesn’t seem to be the issue now.
If the data is fine - what can cause this?
I’ve experienced similar loss spikes in the past - they’re usually attributed to exploding gradients, which as you say can be traced back to the quality/noisiness of the training data. I typically try and compensate by adjusting the learning rate schedule, for which you have two dials to play with: the learning rate and the warm-up steps.
I’ve found training to be much more stable with CTC training by increasing the number of warm-up steps: since the learning rate ramps at a lower rate, you push the activations away from the pre-trained range much more slowly, and usually that results in more stable training. I’ve found 5000 warm-up steps to be a good value when training for 15000 total steps or more. In my experience, changing the warm-up steps has been more effective than lowering the learning rate!