Use wav2vec2 models with a microphone easily

Hello folks,
I wrote a little lib to be able to use any wav2vec2 model from the model hub with a microphone. Since wav2vec2 does not support streaming mode, I used voice activity detection to create audio chunks that I can feed into the model.

Here is a little example, you can find the code on github.

from live_asr import LiveWav2Vec2

german_model = "maxidl/wav2vec2-large-xlsr-german"
asr = LiveWav2Vec2(german_model,device_name="default")
asr.start()

try:        
    while True:
        text,sample_length,inference_time = asr.get_last_text()                        
        print(f"{sample_length:.3f}s"
        +f"\t{inference_time:.3f}s"
        +f"\t{text}")
        
except KeyboardInterrupt:   
    asr.stop() 

If you have any questions or feedback feel free to write me.

9 Likes

Pretty cool! would you consider making an implementation for Google Colab / notebooks? Similar to this:

But with the VAD to get a near real-time transcription.

Since it runs pretty well on a CPU there is no need (at least for me) to run it on Colab.