Hello everybody, I’m planning on fine-tuning XLSR-Wav2Vec2 in Bemba. I’m happy to collaborate with anybody willing to join me. So I have created this thread so that we can share and discuss issues here.
Summary dataset details:
Language: Bemba (or Icibemba) language of Zambia
Dataset: BembaSpeech (If interested, you can check out the paper for more details).
Duration: The dataset has the total duration of 24hrs of read speech already preprocessed and partitioned into train, dev and test sets.
Size: 2.8Gb
Subset [optional]: There is also a 17hrs subset of the BembaSpeech here consisting of audio files less than 10 seconds.
Progress:
So far, I just quickly tried to fine-tune on the 17hrs subset using the parameters that came with @patrickvonplaten `s notebook but ran into vanishing/exploding problem. So yeah, need to twerk a few parameters. So get in touch if you are willing to join in… I`m happy to collaborate with anyone in the community.
So I have been training using the 17hrs subset [optional] of the BembaSpeech: train, dev and test.
To test the waters, first I trained using the dev and test sets only. The trainign went on without a problem. However, when I decided to include the training and evaluate on the dev set… I started getting the nan results.
Hey Claytone, I would suggest to play around a bit with learning_rate and dropout. I’d both try to reduce and increase the learning rate…and reduce dropout if you keep getting nan for the training loss
Thank you @patrickvonplaten. I will try that too. Is there a restriction to what the model accepts as maximum and minimum durations (length) of the audio files? Just in case…