Use wav2vec2 models with a microphone easily

oliverguhr · April 15, 2021, 11:44am

Hello folks,
I wrote a little lib to be able to use any wav2vec2 model from the model hub with a microphone. Since wav2vec2 does not support streaming mode, I used voice activity detection to create audio chunks that I can feed into the model.

Here is a little example, you can find the code on github.

from live_asr import LiveWav2Vec2

german_model = "maxidl/wav2vec2-large-xlsr-german"
asr = LiveWav2Vec2(german_model,device_name="default")
asr.start()

try:        
    while True:
        text,sample_length,inference_time = asr.get_last_text()                        
        print(f"{sample_length:.3f}s"
        +f"\t{inference_time:.3f}s"
        +f"\t{text}")
        
except KeyboardInterrupt:   
    asr.stop()

If you have any questions or feedback feel free to write me.

SamuelAzran · May 20, 2021, 10:26pm

Pretty cool! would you consider making an implementation for Google Colab / notebooks? Similar to this:

github.com

voidful/huggingface_notebook/blob/main/xlsr_gpt.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "xlsr tw.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZXXYTItj9pDQ"

This file has been truncated. show original

But with the VAD to get a near real-time transcription.

oliverguhr · May 28, 2021, 3:14pm

Since it runs pretty well on a CPU there is no need (at least for me) to run it on Colab.

Topic		Replies	Views
Online/streaming speech recognition Research	2	3037	October 26, 2022
Live Transcription/ASR Beginners	0	1643	September 18, 2022
Different versions of 'wav2vec2' model and their differences Beginners	1	1513	August 7, 2021
Wav2vec2 and whisper ASR live streaming Models	1	764	May 15, 2023
Wav2Vec2 Fine Tuning Models	0	257	December 21, 2023

Use wav2vec2 models with a microphone easily

Related topics