Hi guys, I started this Topic regarding fine tuning Wav2Vec2 in Portuguese language. Let’s exhange knowledge and help to implement this model at HuggingFace
I’m training on Google Cloud with 2 T4 GPUs, which are quite inexpensive and using screen
to keep the code running, even if SSH disconnects for any reason. Steps to run screen properly:
$ script /dev/null
$ screen
ENTER
$ python training.py
CLOSE WINDOW
To check what part of the code is running, SSH again an run:
$ screen -r
3310.pts-6.training
$ screen -d -r 3310
To quit screen:
$ screen -X -S 3310.pts-6.training quit
Hi @Rubens,
Do you think it will be possible to do it on Colab?
The dataset looks huge and becomes larger after all the caching of the processed data.
Hi Gunjan, I tried to train on Colab but I had a “memory full” error. As seen in our Slack, the model can be trained on OVH Cloud: https://www.ovh.ie/
- fill this form OVH - Wav2Vec2 - Fine Tuning week - GPU accounts - Google Sheets
- they will send a voucher code
- more in the discord channel: https://discord.gg/HaNEhBax
You can also try with 24GB RAM and NVIDIA P100 on Google Colab, make a copy of this notebook and use it: https://colab.research.google.com/drive/1D6krVG0PPJR2Je9g5eN_2h6JP73_NUXz
Current state of effort:
My main model wer got stuck at 0.3656 = 3e-4 became a high learning rate. This happened in checkpoint 2600. So, I decided to use the saved weights to run two models in parallel.
First model: take original model and decrease learning rate to 0.8e-4: result so far at checkpoint 400:
Second model: comment #model.freeze_feature_extractor(). Result so far at checkpoint 400 (this will take much longer to train):
Also, I had GPU issues so the training is not optimized.
UPDATE
Model 2 - comment #model.freeze_feature_extractor() aborted. GPU config issues + computing expensive.
Model 1 - decreased learning rate to 0.8e-4: wer = 0.33
Dataset = common voice = 1.7GB
Hi Rubens
Nice to know that you found a good set of hyperparameters.
I’m not a native speaker of Portuguese, did you change the vocab dict in any manner?
Oi pessoal, tudo bem?
Estou com dúvida de como fazer a predição
Estou fazendo baseado em um modelo de fine-tuning XLSR-Wav2Vec2
e não acho otimal apagar os labels em texto se depois vou usá-los para comparar
Mas está extremamente desgastante porque eles não explicam qual o formato do input para a função compute_metrics só falam que é pred…então fica um black blox dentro de outro black box…
não sei dar o forward e não tive um contato significativo com pytorch, ai quando vou tentar fazer predição por batch, pronto…
só dá erro…
vou acabar tendo que criar uma função que repete o que outra função faz porque não sei o que é preds…