Wanted to check on this model released by facebook. It a cross-lingual model, does it mean that this model can understand audio that contained mixed languages?
Can I fine-tuned this model with 2 languages dataset?
This is so that I do not need to detect my audio language source before I do the ASR.


I have not personally tried it but finetuning on 2 languages simultaneously would probably work. But as the number of tokens would be higher the accuracy would be lower for the same amount of data.

Another way you can approach it is by running 3 models in parallel. Language detection can be a separate pipeline. I am assuming that you are doing this in a business context. So you can have 3 models running in parallel, one for language detection and one each for the two languages. Based on the output of the language model you can pick the respective ASR output.

Ya i see alot people did fine tune on single language. So was wondering why no one did for 2 languages in one model?

Can this xls-r do language detect? Or is there any language model that does audio language detect?

1 Like

Hi dear @becks
have you done anything to solve your problem?
I have the same problem I want to have a model with both Persian and English characters as you know it’s almost possible in kaldi-based models but with Wav2vec2 I have no idea !
I want to ask @patrickvonplaten is it possible for Wav2vec2 to finetune the model on two different languages simultaneously?