Pre-training for Wav2Vec2-XLSR via Huggingface

Hi guys! I note that the most topics are related to fine-tuning a pre-trained model. But if I have got some new unlabeled data, how can I preform the pre-training process via Huggingface?

1 Like

Hey Javen,

We’ve now an official wav2vec2-pretraining example here: transformers/examples/pytorch/speech-pretraining at master · huggingface/transformers · GitHub

1 Like

Hi @patrickvonplaten, I re-run this scripts on Google colab. I pass all parameter as same as you recommend on README but after some epoch, the loss is not decrease.

| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 5.137e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 8.068e-19
| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 4.952e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 4.017e-19
| loss: 9.969e-02| constrast_loss: 0.000e+00| div_loss: 9.969e-01| %_mask_idx: 4.831e-01| ppl: 2.000e+00| lr: 1.572e-03| temp: 1.902e+00| grad_norm: 5.166e-19

Can you take a look? what did I miss?

Hi @patrickvonplaten and @tiena2cva,
Thanks for the new official wav2vec2-pretraining example, this helps a lot!
I had the same problem as @tiena2cva. Tried to re-run the demo script with the same parameters on my own gpu. After a few epochs the contrastive loss was decreased to zero and the model stopped changing.
Running inference showed that the quantizer maps all the time steps to the same vector (can be seen at projected_quantized_states), which explains the zero contrastive loss.
I would have thought that the diversity loss weight should be increased, but I used the parameters given in the README file so this behavior is unexpected and may indicate a different problem.

Can you please help?

2 Likes