Pre-training for Wav2Vec2-XLSR via Huggingface

Hi guys! I note that the most topics are related to fine-tuning a pre-trained model. But if I have got some new unlabeled data, how can I preform the pre-training process via Huggingface?

2 Likes

Hey Javen,

We’ve now an official wav2vec2-pretraining example here: transformers/examples/pytorch/speech-pretraining at master · huggingface/transformers · GitHub

1 Like

Hi @patrickvonplaten, I re-run this scripts on Google colab. I pass all parameter as same as you recommend on README but after some epoch, the loss is not decrease.

| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 5.137e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 8.068e-19
| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 4.952e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 4.017e-19
| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 4.831e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 5.166e-19

Can you take a look? what did I miss?

Hi @patrickvonplaten and @tiena2cva,
Thanks for the new official wav2vec2-pretraining example, this helps a lot!
I had the same problem as @tiena2cva. Tried to re-run the demo script with the same parameters on my own gpu. After a few epochs the contrastive loss was decreased to zero and the model stopped changing.
Running inference showed that the quantizer maps all the time steps to the same vector (can be seen at projected_quantized_states), which explains the zero contrastive loss.
I would have thought that the diversity loss weight should be increased, but I used the parameters given in the README file so this behavior is unexpected and may indicate a different problem.

Can you please help?

3 Likes

Hi @patrickvonplaten and @sgugger , I am facing the same issue, constrast_loss is not changing after 3 backward it goes straight to zero and didn’t change even after many epochs, any idea how can we reproduce the results. Thanks

Hi, @tiena2cva and @ayanas any update or solution to your problem?