Swedish ASR: Fine Tuning Wav2Vec2

birgermoell · March 18, 2021, 8:13am

Hey everyone. I trained the model in Swedish (just on the standard params) and I’m curious if we could figure out a good way to fine tune the parameters.
My WER after 4000 steps was 0.511916 on a dataset of 402mb.
I created a spreadsheet, maybe if people could fill out some parameters on how we trained we could figure out better parameters for training.

Here is a link to my Google Colaboratory.

moonhouse · March 18, 2021, 11:28pm

I ran the same (didn’t change any parameters but did filter out apostrophe) tonight and got WER of 0.514714.

moonhouse · March 19, 2021, 9:24am

Out of curiosity I took the inference part from the notebook and looped out the predicted text together with the original text.

for i in range(len(common_voice_test["input_values"])):
  input_dict = processor(common_voice_test["input_values"][i], return_tensors="pt", padding=True)

  logits = model(input_dict.input_values.to("cuda")).logits

  pred_ids = torch.argmax(logits, dim=-1)[0]
  print(str(i)+"\t"+processor.decode(pred_ids) + "\t" + common_voice_test_transcription["sentence"][i].lower())

I got 76 lines (out of 2027) before Colaboratory disconnected me. I had to remove 76 warnings about sampling_rate not being provided. I tried to send sampling_rate to 16000 but then it looked I got a different prediction result (but it could just being that you get different prediction result each time you run).

agrin · March 23, 2021, 8:49am

Did any of you start looking at using the NST database (I can see that it’s listed in the sheet)? Maybe this would be good to collaborate on?

birgermoell · March 23, 2021, 9:11am

The KB labb trained a model using the NST database that currently has the lowest WER

The model was trained only on NST so a good next step might be to train on both NST and Common Voice

Topic		Replies	Views
Indonesian ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	35	2566	March 1, 2023
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3009	January 4, 2022
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3681	February 18, 2022
Common-voice + librispeech Beginners	0	395	June 29, 2022
Russian ASR: Fine-tuning Wav2Vec2 Languages at Hugging Face	20	2698	May 22, 2021

Swedish ASR: Fine Tuning Wav2Vec2

Related topics