Portuguese ASR: Fine-Tuning XLSR-Wav2Vec2

Hi guys, I started this Topic regarding fine tuning Wav2Vec2 in Portuguese language. Let’s exhange knowledge and help to implement this model at HuggingFace :hugs:

1 Like

I’m training on Google Cloud with 2 T4 GPUs, which are quite inexpensive and using screen to keep the code running, even if SSH disconnects for any reason. Steps to run screen properly:

$ script /dev/null
$ screen


$ python training.py


To check what part of the code is running, SSH again an run:

$ screen -r

$ screen -d -r 3310

To quit screen:

$ screen -X -S 3310.pts-6.training quit
1 Like

Four hours of training on GCP, checkpoint 600:

Checkpoint 1800, batch size=8


Hi @Rubens,

Do you think it will be possible to do it on Colab?

The dataset looks huge and becomes larger after all the caching of the processed data.

Hi Gunjan, I tried to train on Colab but I had a “memory full” error. As seen in our Slack, the model can be trained on OVH Cloud: https://www.ovh.ie/

You can also try with 24GB RAM and NVIDIA P100 on Google Colab, make a copy of this notebook and use it: https://colab.research.google.com/drive/1D6krVG0PPJR2Je9g5eN_2h6JP73_NUXz

1 Like

Current state of effort:

My main model wer got stuck at 0.3656 = 3e-4 became a high learning rate. This happened in checkpoint 2600. So, I decided to use the saved weights to run two models in parallel.

First model: take original model and decrease learning rate to 0.8e-4: result so far at checkpoint 400:

Second model: comment #model.freeze_feature_extractor(). Result so far at checkpoint 400 (this will take much longer to train):

Also, I had GPU issues so the training is not optimized.

1 Like


Model 2 - comment #model.freeze_feature_extractor() aborted. GPU config issues + computing expensive.

Model 1 - decreased learning rate to 0.8e-4: wer = 0.33

Dataset = common voice = 1.7GB

lr 0.8 -- 0.33

1 Like

Hi Rubens

Nice to know that you found a good set of hyperparameters.
I’m not a native speaker of Portuguese, did you change the vocab dict in any manner?

Git pushed the first model with wer = 20.41% for the Portuguese language.

Model still training, currently at wer = 19.30

Learning rate at 0.013e-4

Oi pessoal, tudo bem?

Estou com dúvida de como fazer a predição
Estou fazendo baseado em um modelo de fine-tuning XLSR-Wav2Vec2

e não acho otimal apagar os labels em texto se depois vou usá-los para comparar

Mas está extremamente desgastante porque eles não explicam qual o formato do input para a função compute_metrics só falam que é pred…então fica um black blox dentro de outro black box…
não sei dar o forward e não tive um contato significativo com pytorch, ai quando vou tentar fazer predição por batch, pronto…
só dá erro…
vou acabar tendo que criar uma função que repete o que outra função faz porque não sei o que é preds…