Just tried running the finetuning code plus some minor modifications on an EC2 instance with a V100 and it just wasn’t enough, even when reducing the batch size.
What are yall’s experiences when using the Wav2Vec2 big models? Especially the XLSR multilingual model?
I am also trying to train XLSR multilingual model. Can you please tell how much bigger your data and training details (elapsed time, epoch, etc…)? I guess that your problem is computational cost when you are saying that it is not enough.
how you calculated the time per epoch? and for how many epoch did you train… or the total of your train data? also what about the validation set time size?