Portuguese ASR: Fine-Tuning XLSR-Wav2Vec2

Rubens · March 19, 2021, 1:19am

Hi guys, I started this Topic regarding fine tuning Wav2Vec2 in Portuguese language. Let’s exhange knowledge and help to implement this model at HuggingFace

Rubens · March 19, 2021, 5:04pm

I’m training on Google Cloud with 2 T4 GPUs, which are quite inexpensive and using screen to keep the code running, even if SSH disconnects for any reason. Steps to run screen properly:

$ script /dev/null
$ screen

ENTER

$ python training.py

CLOSE WINDOW

To check what part of the code is running, SSH again an run:

$ screen -r

3310.pts-6.training

$ screen -d -r 3310

To quit screen:

$ screen -X -S 3310.pts-6.training quit

Rubens · March 19, 2021, 10:23pm

Four hours of training on GCP, checkpoint 600:

Checkpoint 1800, batch size=8

wer-1800

gchhablani · March 20, 2021, 1:28pm

Hi @Rubens,

Do you think it will be possible to do it on Colab?

The dataset looks huge and becomes larger after all the caching of the processed data.

Rubens · March 20, 2021, 8:32pm

Hi Gunjan, I tried to train on Colab but I had a “memory full” error. As seen in our Slack, the model can be trained on OVH Cloud: https://www.ovh.ie/

fill this form OVH - Wav2Vec2 - Fine Tuning week - GPU accounts - Google Sheets
they will send a voucher code
more in the discord channel: https://discord.gg/HaNEhBax

You can also try with 24GB RAM and NVIDIA P100 on Google Colab, make a copy of this notebook and use it: https://colab.research.google.com/drive/1D6krVG0PPJR2Je9g5eN_2h6JP73_NUXz

Rubens · March 20, 2021, 8:40pm

Current state of effort:

My main model wer got stuck at 0.3656 = 3e-4 became a high learning rate. This happened in checkpoint 2600. So, I decided to use the saved weights to run two models in parallel.

First model: take original model and decrease learning rate to 0.8e-4: result so far at checkpoint 400:

Second model: comment #model.freeze_feature_extractor(). Result so far at checkpoint 400 (this will take much longer to train):

Also, I had GPU issues so the training is not optimized.

Rubens · March 21, 2021, 1:23am

UPDATE

Model 2 - comment #model.freeze_feature_extractor() aborted. GPU config issues + computing expensive.

Model 1 - decreased learning rate to 0.8e-4: wer = 0.33

Dataset = common voice = 1.7GB

lr 0.8 -- 0.33

gchhablani · March 22, 2021, 7:53am

Hi Rubens

Nice to know that you found a good set of hyperparameters.
I’m not a native speaker of Portuguese, did you change the vocab dict in any manner?

Rubens · March 22, 2021, 8:59pm

Git pushed the first model with wer = 20.41% for the Portuguese language.

Rubens · March 23, 2021, 11:33am

Model still training, currently at wer = 19.30

Learning rate at 0.013e-4

jolurf · April 16, 2021, 11:20pm

Oi pessoal, tudo bem?

Estou com dúvida de como fazer a predição
Estou fazendo baseado em um modelo de fine-tuning XLSR-Wav2Vec2

e não acho otimal apagar os labels em texto se depois vou usá-los para comparar

Mas está extremamente desgastante porque eles não explicam qual o formato do input para a função compute_metrics só falam que é pred…então fica um black blox dentro de outro black box…
não sei dar o forward e não tive um contato significativo com pytorch, ai quando vou tentar fazer predição por batch, pronto…
só dá erro…
vou acabar tendo que criar uma função que repete o que outra função faz porque não sei o que é preds…

Topic		Replies	Views
Wav2Vec2 Fine Tuning Models	0	258	December 21, 2023
Fine-tune wav2vec2-large-xlsr-53 for one epoch 🤗Transformers	0	432	January 11, 2022
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3681	February 18, 2022
Swedish ASR: Fine Tuning Wav2Vec2 Models	4	865	March 23, 2021
Russian ASR: Fine-tuning Wav2Vec2 Languages at Hugging Face	20	2701	May 22, 2021

Portuguese ASR: Fine-Tuning XLSR-Wav2Vec2

Related topics