How much fire power are we expected to have in order to fine tune the W2V2 XLSR model?

jjdv · March 13, 2021, 9:46pm

Just tried running the finetuning code plus some minor modifications on an EC2 instance with a V100 and it just wasn’t enough, even when reducing the batch size.

What are yall’s experiences when using the Wav2Vec2 big models? Especially the XLSR multilingual model?

EmreOzkose · March 16, 2021, 6:07am

I am also trying to train XLSR multilingual model. Can you please tell how much bigger your data and training details (elapsed time, epoch, etc…)? I guess that your problem is computational cost when you are saying that it is not enough.

jjdv · March 16, 2021, 1:26pm

I forgot to update this thread. My main problem was the improper segmentation of some audio files. Once I did that correctly it ran just fine.

I’m fine-tuning the model to a very small amount of data so these numbers might probably not mean much to you (on a V100):

Batch size of 32
16kHz sampling rate
Mixed precision
~2mins per epoch

patrickvonplaten · March 17, 2021, 2:10pm

We are organizing a “fine-tuning XLSR-53” event. Check this announcement: [Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages. Would be awesome if you want to participate

tyoc213 · March 27, 2021, 6:50am

how you calculated the time per epoch? and for how many epoch did you train… or the total of your train data? also what about the validation set time size?

Topic		Replies	Views
Indonesian ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	35	2571	March 1, 2023
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3681	February 18, 2022
How to finetune wav2vec2.0-xlsr model with long audio files Beginners	1	825	September 6, 2022
Wav2Vec2 Fine Tuning Models	0	258	December 21, 2023
Hindi ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	19	3010	January 4, 2022

How much fire power are we expected to have in order to fine tune the W2V2 XLSR model?

Related topics