How does XLSR-Wav2Vec2 behave on noisy data?

I would like to train own ASR system where the environment is very noisy.

If someone has experience on the same topic it would be great to listen to you here.

1 Like

I’d like to give some feedback from myself to the subject.

We have noised the Common Voice 10 with Dmytro Chaplynsky and I successfully trained a model on the data.

The published model: Yehor/wav2vec2-xls-r-300m-uk-with-small-lm-noisy · Hugging Face

The noised data: GitHub - egorsmkv/speech-recognition-uk: Speech Recognition for Ukrainian

This model is trained for Ukrainian.

I have posted metrics in the HF page.