When I try to finetune wav2vec2-large-xlsr-53 with FSC dataset for ASR using built-in class Wav2VecForCTC, the CTC loss is not converging, and the system only outputs blank labels even for training instances.
Here is the log of training (overfiting) with only 8 training instances in total:
Epoch 60/1000, Batch 1/1, Total Step = 60, Loss = 26.296, CER = 100.000, Gold: ['SWITCH OFF THE LIGHTS', 'TURN THE VOLUME UP'] Pred: ['', ''] Epoch 1000/1000, Batch 1/1, Total Step = 1000, Loss = 2.663, CER = 100.000 Gold: ['SWITCH OFF THE LIGHTS', 'TURN THE VOLUME UP'] Pred: ['', '']
We can see that even at Epcoh 60, the CTC loss is ~26 and the system only outputs blank labels for training instances. Continuing training to 1000 epoch will reduce the CTC loss, but still the system outputs blanks.
However, if I use an ASR finetuned model (the model is finetuned even on Chinese corpora), with exactly the same code, continuing finetuning on FSC can quickly overfit the training instances, And we can see now the CTC loss is small and we can reproduce input pretty well:
Epoch 37/1000, Batch 1/1, Total Step = 37, Loss = 0.681, CER = 21.818, ['CHANGE LANGUAGE', 'TURN THE LIGHTS ON'] ['CHANE LANGUAEEI', 'TURN THE LITSH ONT'] Epoch 100/1000, Batch 1/1, Total Step = 100, Loss = 0.028, CER = 2.727, ['SWITCH OFF THE LIGHTS', 'SWITCH ON THE LIGHTS'] ['SWITCH OFF THE LIGHTS', 'SWIITCH ON THE LIGHTSWW']
Here is the code to create new model loading different pretrained models:
model = Wav2Vec2ForCTC.from_pretrained( args.audio_model, gradient_checkpointing=True, apply_spec_augment=False, vocab_size=processor.tokenizer.vocab_size, hidden_dropout=0.05, activation_dropout=0.05, feat_proj_dropout=0.05, layerdrop=0.05, final_dropout=0.05, mask_time_prob=0.05, ctc_loss_reduction='mean', ctc_zero_infinity=True, )
I am using
1e-4 as learning rate. Both models use the same vocabulary of size ~3k (with Chinese chars). This configuration is exactly the same for both pretrained models, but still yields different behaviors. Note that I also tried
ctc_loss_reduction in xlsr but also got the same blanks.
Could anybody help me on that? Thank you very much!