Model generating incorrect prediction

Hello Everyone,

When I decode the audio, the model generates the wrong transcription. Can anyone help me to get out of this?

Dataset:
from datasets import load_dataset
dataset = load_dataset(“mozilla-foundation/common_voice_8_0”, “ur”, split=“test”,use_auth_token=True)

Installed Libraries:

!pip install https://github.com/kpu/kenlm/archive/master.zip
!pip install pyctcdecode==0.3.0
!pip install datasets==2.0.0
!pip install torchaudio==0.11
!pip install transformers==4.18.0

Code:
import IPython.display as ipd
audio_sample = dataset[3]
print(audio_sample[“sentence”].lower())
ipd.Audio(data=audio_sample[“audio”][“array”], autoplay=True, rate=audio_sample[“audio”][“sampling_rate”])

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
processor = Wav2Vec2Processor.from_pretrained("
BakhtUllah123/xls-r-ur-large")
model = Wav2Vec2ForCTC.from_pretrained("
BakhtUllah123/xls-r-ur-large")
inputs = processor(audio_sample[“audio”][“array”], sampling_rate=16_000, return_tensors=“pt”)

import torch
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
transcription[0].lower()

Output:
image

@patrickvonplaten Can you please look it up?