[STT] Using huggingface pretrained models but different results =>Wav2Vec2 vs PatrickDemo

I am novice here and I am using different pretrained model other than Wav2Vec2. I am now playing with createWav2Vec2 py. provided by Pytorch. android-demo-app/create_wav2vec2.py at master 路 pytorch/android-demo-app 路 GitHub

I load the pretrained model from hugging face , but during the sanity check , the transcribed text is totally wrong.

Place I changed from

model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")

To

model1 = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-base-timit-demo-colab")

Expected answer

Result: I HAD THAT CURIOSITY BESIDE ME AT THIS MOMENT

But i got

Result: J <pad></s>DJ<pad>F</s>DJF<pad>JBJSN JKJCJ JFJO<pad>YLJCJ L<pad>HL<pad> F<pad>F</s> JC<pad>JHKJHLRFJ<pad>

Could somebody advise what is wrong here?