Pre-training for Wav2Vec2-XLSR via Huggingface

Javen · June 28, 2021, 7:16am

Hi guys! I note that the most topics are related to fine-tuning a pre-trained model. But if I have got some new unlabeled data, how can I preform the pre-training process via Huggingface?

patrickvonplaten · October 11, 2021, 10:48pm

Hey Javen,

We’ve now an official wav2vec2-pretraining example here: transformers/examples/pytorch/speech-pretraining at master · huggingface/transformers · GitHub

tiena2cva · October 13, 2021, 3:48pm

Hi @patrickvonplaten, I re-run this scripts on Google colab. I pass all parameter as same as you recommend on README but after some epoch, the loss is not decrease.

| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 5.137e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 8.068e-19
| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 4.952e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 4.017e-19
| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 4.831e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 5.166e-19

Can you take a look? what did I miss?

ayanas · October 17, 2021, 3:27pm

Hi @patrickvonplaten and @tiena2cva,
Thanks for the new official wav2vec2-pretraining example, this helps a lot!
I had the same problem as @tiena2cva. Tried to re-run the demo script with the same parameters on my own gpu. After a few epochs the contrastive loss was decreased to zero and the model stopped changing.
Running inference showed that the quantizer maps all the time steps to the same vector (can be seen at projected_quantized_states), which explains the zero contrastive loss.
I would have thought that the diversity loss weight should be increased, but I used the parameters given in the README file so this behavior is unexpected and may indicate a different problem.

Can you please help?

iamaliahmad · November 13, 2021, 5:57pm

Hi @patrickvonplaten and @sgugger , I am facing the same issue, constrast_loss is not changing after 3 backward it goes straight to zero and didn’t change even after many epochs, any idea how can we reproduce the results. Thanks

iamaliahmad · November 13, 2021, 5:58pm

Hi, @tiena2cva and @ayanas any update or solution to your problem?

Kshitizkhandelwal · November 14, 2022, 1:13pm

hey @tiena2cva . Were you able to pre train your model? If yes, can you share your code?

tiena2cva · November 14, 2022, 4:22pm

Hi @Kshitizkhandelwal, it was a long time ago, let me check in the next day. I have done pre-training wav2vec2 for Vietnamese

Kshitizkhandelwal · November 15, 2022, 1:41pm

@tiena2cva If possible please let me know on this?

Bauyrjan · November 18, 2022, 7:44am

Good day! I am consnantly getting an error “realloc of size … failed” and I wanted to know is there any solution to overpass it?

mohammadtvk · January 9, 2023, 8:55am

I have the same situation here.
I tried to pretrain wave2vec on 8K and 16K samples. after few steps contrastive loss goes 0, diversity loss shoots up to 1 and perplexity to 2.
I also tried changing hyper params like learning rate and gumble temp, but no luck
any updates ?
@tiena2cva @patrickvonplaten

morenolq · March 28, 2023, 7:04am

Do you have any updates? I’m experiencing a similar issue where the diversity loss starts off very high, around 1500, but after the warmup period, it quickly drops down to 0.0 within 10K steps. Conversely, the diversity loss starts at around 300 and gradually increases to 400, then oscillates.

OmarEXP · June 6, 2023, 3:50pm

Hi @mohammadtvk I am experiencing the same problem did you find a solution?

mohammadtvk · June 6, 2023, 6:37pm

unfortunately no. I used fairseq for pretraining

morenolq · June 9, 2023, 1:05pm

I also switched to fairseq, there are several LR and threshold management things missing in HF I think.

bilalhsp · November 5, 2024, 8:16pm

Hi guys,
Wav2vec2 paper mentions some customization to the training loop, like gradient scaling, penalty on feature encoder, temperature annealing. I am not sure if they are implemented in the example script or not. Here is a tutorial that lets you customize the training loop by extending the Huggingface Trainer class. The tutorial extends one method of Trainer class but also points to more resources to implement full functionality.

This might be useful for someone.

Topic		Replies	Views
Wav2Vec2 Speech Pre-Training After a few epochs the contrastive loss was decreased to zero and the model stopped changing 🤗Transformers	2	749	November 21, 2023
Is there any tutorial 4 pretrain wav2vec2 Beginners	4	1049	November 6, 2024
Why is Wav2Vec pretraining loss not decreasing? Models	10	2665	April 29, 2022
Pre-training loss Wav2Vec2 Conformer Models	0	313	March 13, 2023
Wav2VecForPreTraining - Not able to run trainer.train() Beginners	3	681	October 19, 2021

Pre-training for Wav2Vec2-XLSR via Huggingface

Related topics