Bemba ASR: Fine-Tuning Wav2Vec2

csikasote · March 20, 2021, 9:26am

Hello everybody, I’m planning on fine-tuning XLSR-Wav2Vec2 in Bemba. I’m happy to collaborate with anybody willing to join me. So I have created this thread so that we can share and discuss issues here.

Summary dataset details:

Language: Bemba (or Icibemba) language of Zambia

Dataset: BembaSpeech (If interested, you can check out the paper for more details).

Duration: The dataset has the total duration of 24hrs of read speech already preprocessed and partitioned into train, dev and test sets.

Size: 2.8Gb

Subset [optional]: There is also a 17hrs subset of the BembaSpeech here consisting of audio files less than 10 seconds.

Progress:

So far, I just quickly tried to fine-tune on the 17hrs subset using the parameters that came with @patrickvonplaten `s notebook but ran into vanishing/exploding problem. So yeah, need to twerk a few parameters. So get in touch if you are willing to join in… I`m happy to collaborate with anyone in the community.

Cheers!

csikasote · March 21, 2021, 4:10pm

Update on my trainings progress:

So I have been training using the 17hrs subset [optional] of the BembaSpeech: train, dev and test.

To test the waters, first I trained using the dev and test sets only. The trainign went on without a problem. However, when I decided to include the training and evaluate on the dev set… I started getting the nan results.

patrickvonplaten · March 21, 2021, 4:11pm

Hey Claytone, I would suggest to play around a bit with learning_rate and dropout. I’d both try to reduce and increase the learning rate…and reduce dropout if you keep getting nan for the training loss

csikasote · March 21, 2021, 4:15pm

Thank you @patrickvonplaten. I will try that too. Is there a restriction to what the model accepts as maximum and minimum durations (length) of the audio files? Just in case…

patrickvonplaten · March 21, 2021, 4:22pm

There is no real restriction, but for input samples that have a duration longer than ~2 minutes you might get “out-of-memory” errors. See here a related issue: can't allocate memory error with wav2vec2 · Issue #10366 · huggingface/transformers · GitHub

Topic		Replies	Views
Thai ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	0	1022	March 18, 2021
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3681	February 18, 2022
Indonesian ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	35	2564	March 1, 2023
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3004	January 4, 2022
[Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages Languages at Hugging Face	411	17409	December 9, 2021

Bemba ASR: Fine-Tuning Wav2Vec2

Related topics