Hello folks,
I wrote a little lib to be able to use any wav2vec2 model from the model hub with a microphone. Since wav2vec2 does not support streaming mode, I used voice activity detection to create audio chunks that I can feed into the model.
Here is a little example, you can find the code on github.
from live_asr import LiveWav2Vec2
german_model = "maxidl/wav2vec2-large-xlsr-german"
asr = LiveWav2Vec2(german_model,device_name="default")
asr.start()
try:
while True:
text,sample_length,inference_time = asr.get_last_text()
print(f"{sample_length:.3f}s"
+f"\t{inference_time:.3f}s"
+f"\t{text}")
except KeyboardInterrupt:
asr.stop()
If you have any questions or feedback feel free to write me.