Hello everybody, I’m planning on fine-tuning XLSR-Wav2Vec2 in Bemba. I’m happy to collaborate with anybody willing to join me. So I have created this thread so that we can share and discuss issues here.
Summary dataset details:
- Language: Bemba (or Icibemba) language of Zambia
- Dataset: BembaSpeech (If interested, you can check out the paper for more details).
- Duration: The dataset has the total duration of 24hrs of read speech already preprocessed and partitioned into train, dev and test sets.
- Size: 2.8Gb
- Subset [optional]: There is also a 17hrs subset of the BembaSpeech here consisting of audio files less than 10 seconds.
So far, I just quickly tried to fine-tune on the 17hrs subset using the parameters that came with @patrickvonplaten `s notebook but ran into vanishing/exploding problem. So yeah, need to twerk a few parameters. So get in touch if you are willing to join in… I`m happy to collaborate with anyone in the community.