Wav2vec2-large-xlsr-53 for non-listed low resource language

becks · April 13, 2021, 5:32am

Hi,

Can we fine-tuned the model for low resource language not inside the 53 languages listed?
Does it have to be inside the same language family of the 53 listed?

Regards
Becks

omarsou · May 11, 2021, 9:56am

Hello,

Yes you can fine-tune the model for low resource language outside the ones which the model was pretrained on. On the original paper of XLSR-53 ([2006.13979] Unsupervised Cross-lingual Representation Learning for Speech Recognition) you can see on table 3 and table 4 that they finetuned on out-of-pretraining languages.
In addition, during my school project, we finetuned XLSR-53 ont the Czech and Ukrainian language and got some good results , (feel free to check my github - GitHub - omarsou/wav2vec_xlsr_cv_exp: Experiments on out of training languages (from common voice https://commonvoice.mozilla.org/) using Wav2Vec ).

Best,
Omar

Topic		Replies	Views
Further train a fine tuned wav2vec model 🤗Transformers	2	531	September 25, 2022
Using Wav2Vec2.0 XLSR without finetuning possible? Beginners	0	253	November 29, 2022
Wav2vec2-large-xlsr-53 🤗Transformers	4	812	July 26, 2022
[Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages Languages at Hugging Face	411	17414	December 9, 2021
Baseline vs language-specific finetuned model for multilingual speech recognition 🤗Transformers	0	313	September 20, 2022

Wav2vec2-large-xlsr-53 for non-listed low resource language

Related topics