gr.Interface.from_pipeline() vs gr.Interface()

Hello!

I wrote this simple ASR using pipeline and it works flawlessly

pip install gradio
pip install pipeline
pip install transformers

import gradio as gr
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", "facebook/wav2vec2-base-960h")
gr.Interface.from_pipeline(pipe).launch()

But because of the pipeline, I am limited to what I can do with the output (if my assumption in using pipelines is correct). So I tried using gr.Interface() so I can put stuff inside my function

def asr(mic_input): #edited now
  if "hello" in str(pipe(mic_input)): #convert to string so I can search inside
    reply = "hi"
    return reply
  else:
    return str(pipe(mic_input))

gr.Interface(fn=asr,
             inputs="mic",
             outputs="text",
             ).launch()

What I am expecting is if it detects a “hello” somewhere in the speech, it replies “hi” instead of transcribing the speech audio. However, after submitting the recorded audio from the mic, it only displays “error”

I did similar concepts of manipulating the output with other tasks such as text classification and it worked flawlessly. My guess is that there’s something happening within the pipeline that is particular in processing audio files input? If so, how can I manually do what the pipeline does?

Hey there @epdavid2, as a tip, you can add debug=True in launch and you will get more information on the error

So the error is in the asr method. The parameter has a type, it is mic_nput but in the function you use mic_input

@osanseviero Hello! I actually corrected the mic_nput a while ago but the error still persists. I’m looking at the debug notes right now. Thank you!

SG! As a tip, you can check what the audio actually gives you Gradio Docs (it is tuple of sample rate and the numpy array, you just want the second element).

1 Like