Hey everyone. I trained the model in Swedish (just on the standard params) and I’m curious if we could figure out a good way to fine tune the parameters.
My WER after 4000 steps was 0.511916 on a dataset of 402mb.
I created a spreadsheet, maybe if people could fill out some parameters on how we trained we could figure out better parameters for training.
Out of curiosity I took the inference part from the notebook and looped out the predicted text together with the original text.
for i in range(len(common_voice_test["input_values"])):
input_dict = processor(common_voice_test["input_values"][i], return_tensors="pt", padding=True)
logits = model(input_dict.input_values.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)[0]
print(str(i)+"\t"+processor.decode(pred_ids) + "\t" + common_voice_test_transcription["sentence"][i].lower())
I got 76 lines (out of 2027) before Colaboratory disconnected me. I had to remove 76 warnings about sampling_rate not being provided. I tried to send sampling_rate to 16000 but then it looked I got a different prediction result (but it could just being that you get different prediction result each time you run).