Live Transcription/ASR

Hello all,

I hope everything is going well with you guys.

I need an offline “Live ASR Engine” for my project. I used the following GitHub repo that implements live asr using wav2vec2 model: GitHub - oliverguhr/wav2vec2-live: A live speech recognition using Facebooks wav2vec 2.0 model.

The problem is I did not get the expected performance using this wav2vec2 model. I have two queries:

  1. How I can improve the performance of ASR engine accuracy using the wav2vec2 model?
  2. I found two other models from Huggingface: speech2text and speech2text2. I wanted to modify the above code repository to use these models for live transcription but failed to do so. Does anyone use these models to implement live transcription, if so please share your advice?