Finetuning wav2vec2-large-xlsr-53 only outputs blank labels

zhu-y11 · July 23, 2021, 10:07am

Hi,

When I try to finetune wav2vec2-large-xlsr-53 with FSC dataset for ASR using built-in class Wav2VecForCTC, the CTC loss is not converging, and the system only outputs blank labels even for training instances.

Here is the log of training (overfiting) with only 8 training instances in total:

Epoch 60/1000, Batch 1/1, Total Step = 60, Loss = 26.296, CER = 100.000, 
Gold: ['SWITCH OFF THE LIGHTS', 'TURN THE VOLUME UP']
Pred: ['', '']
Epoch 1000/1000, Batch 1/1, Total Step = 1000, Loss = 2.663, CER = 100.000
Gold: ['SWITCH OFF THE LIGHTS', 'TURN THE VOLUME UP']
Pred: ['', '']

We can see that even at Epcoh 60, the CTC loss is ~26 and the system only outputs blank labels for training instances. Continuing training to 1000 epoch will reduce the CTC loss, but still the system outputs blanks.

However, if I use an ASR finetuned model (the model is finetuned even on Chinese corpora), with exactly the same code, continuing finetuning on FSC can quickly overfit the training instances, And we can see now the CTC loss is small and we can reproduce input pretty well:

Epoch 37/1000, Batch 1/1, Total Step = 37, Loss = 0.681, CER = 21.818, 
['CHANGE LANGUAGE', 'TURN THE LIGHTS ON']
['CHANE LANGUAEEI', 'TURN THE LITSH ONT']
Epoch 100/1000, Batch 1/1, Total Step = 100, Loss = 0.028, CER = 2.727, 
['SWITCH OFF THE LIGHTS', 'SWITCH ON THE LIGHTS']
['SWITCH OFF THE LIGHTS', 'SWIITCH ON THE LIGHTSWW']

Here is the code to create new model loading different pretrained models:

model = Wav2Vec2ForCTC.from_pretrained(
            args.audio_model,
            gradient_checkpointing=True,
            apply_spec_augment=False,
            vocab_size=processor.tokenizer.vocab_size,
            hidden_dropout=0.05,
            activation_dropout=0.05,
            feat_proj_dropout=0.05,
            layerdrop=0.05,
            final_dropout=0.05,
            mask_time_prob=0.05,
            ctc_loss_reduction='mean',
            ctc_zero_infinity=True,
        )

I am using Adam with 1e-4 as learning rate. Both models use the same vocabulary of size ~3k (with Chinese chars). This configuration is exactly the same for both pretrained models, but still yields different behaviors. Note that I also tried sum for ctc_loss_reduction in xlsr but also got the same blanks.

Could anybody help me on that? Thank you very much!

zzuczy · January 11, 2022, 1:23pm

I meet the same issue. Do you solve it???

patrickvonplaten · January 11, 2022, 4:34pm

Hey @zzuczy and @zhu-y11 - could you maybe post a link to your repos which include:

model config
training script
tokenizer config

on the Hub (https://huggingface.co/) so that I can inspect the files?

zzuczy · January 12, 2022, 1:41am

Sorry for not reply in time and thanks for your help! For the reason that hard to explain, it’s difficult for me to upload all resource files to Huggingface Hub. This may take some time, I’ll do it as soon as possible.
Thanks again for your willingness to help！

zzuczy · January 12, 2022, 3:28am

The line is here debug link
The data I use is Aishell-1, which is a Chinese ASR coupus.

the pre-trained model is pretrained model

btw, it seems that ctc just predict blank is a frequent problem，here are some examples raised by others
ctc problem
ctc problem
ctc problem

Some answers said that ctc learn blank firstly and then turn to learn commot characters. I don’t know whether such interpretation is ture or not.

perhaps, in fact, I use a huge vocab including 8000+ Chinese characters and train data only have 170h,the model was hard to be trained?

zzuczy · January 14, 2022, 6:29am

UODATE:
I changed the pre-training modelfacebook/wav2vec2-base-100k-voxpopuli from to facebook/wav2vec2-base. It works. Everything is ok. Maybe there is something wrong in facebook/wav2vec2-base-100k-voxpopuli?

zliu-elliot · March 29, 2023, 6:07am

Have you tried TencentGameMate/chinese-wav2vec2-base/large？
Neither of them could converge on AISHELL-1 in my experiments…

Topic		Replies	Views
Wav2Vec2: fix growing training and validation loss after few epochs Models	5	2240	January 27, 2022
Wav2Vec2: loss growing in training and validation after few epochs Models	6	2042	September 25, 2024
Wav2vec2 not converging when finetuning 🤗Transformers	7	2533	June 15, 2021
Original and re-loaded model are not the same Beginners	0	462	August 14, 2021
Finetunig of wav2vec2-xls-r-300m outputs invalid words for Bengali data Models	6	684	February 1, 2023

Finetuning wav2vec2-large-xlsr-53 only outputs blank labels

Related topics