Implement a real-time VAD

I want to implement a real-time Voice Activity Detection.
I want to stream audio and create matching plot that shows the probability of voice occurrence.
I figured out how to stream audio using gr.Interface(fn=…, inputs=, output=, live=True) and also how to update plot using every. I would like the plot to be dependent on the audio.


Something like this should work:

gr.Interface(fn=…, inputs=gr.Image(streaming=True), output="plot", live=True)