Wav2Vec2 For Indian English

I’m trying to build an Automatic Speech Recognition model for Indian English ( accents, dialect, etc.). I have around 15 hours of labeled data.

I followed the steps blog by @patrickvonplaten replacing the TIMIT dataset with my own keeping everything else the same. After training, the WER is a perfect 1.0.

The trained model outputs blank for every file in the test set and I don’t know where it is going wrong.
Any help would be much appreciated. Is anyone else attempting this?

WER 1.0 is not a very good metric by itself. If you are not getting anything in the output this may be because the model has not learnt anything and there are some silent errors happening. Try increasing the epochs or other tuning methodologies and see if this resolves the issue.

1 Like

Thank you for the suggestion. I increased the number of epochs and it fixed the issue.

How is the model working? It would be great if you could open source it.

@Vishaal Any update on the model you are building? It would be great if you can share the solution for the WER 1.0 error.