I am currently trying to run this model - facebook/wav2vec2-xls-r-2b-22-to-16
The example code given using the pipeline is giving significantly different results compared to the api hosted on hugging face. I recorded and sent the same audio to the api through the website as well as ran the sample code on colab. The output is quite different.
I ran the output even using the patrickvonplaten/librispeech_asr_dummy dataloaded and i checked the audio, the output is different from the text translation.
I tried running using the second step-by-step method too, it fails with "cannot import name ‘SpeechEncoderDecoder’ from ‘transformers’ "
Could you check what could be wrong? I can share my colab if needed.
Thanks for your help in advance.