Realtime speech-to-text solution?

konyaev · October 15, 2023, 6:12pm

I am looking for a way to run server-side speech recognition with the lowest latency possible.

The ideal solution would process audio in realtime by receiving audio samples in a stream (the stream could work on websockets or anything else). A low-code solution would be preferable.

Do Inference Endpoints have functionality that allows that?
If not, do any of other HF products have functionality that allow / enable that?
If not, can you recommend any third-party solutions?

Can you please also recommend a Speech-to-Text model capable of processing input in a form od a stream? I am currently using Whisper (with Inference Endpoints), and, unfortunately, it can only process an audio file as a whole.

RobinM30 · July 24, 2024, 7:43am

Hi,
Did you find a solution to your problem ?
I have exactly the same issue…
Thanks,
Robin

Topic		Replies	Views
How to run text to speech from inference endpoint given audio file url? Beginners	1	899	June 8, 2023
Deploying Whisper Based Live Transcription for 1000 Concurrent users Intermediate	0	349	June 1, 2024
To create "Inference Endpoints" Beginners	0	120	January 15, 2024
Allow Multiple Processes at Once Inference Endpoints on the Hub	0	292	January 2, 2024
Word-by-word TTS model for minimal latency Research	0	534	April 7, 2024

Realtime speech-to-text solution?

Related topics