Support for ASR inference on longer audiofiles or on live transcription?

JPwhale · January 26, 2023, 8:58am

Hi,

I ve finetuned some ASR models (Whisper and XLSR/wave2vec), and I want to use them now for inference. I ve deployed an API on the cloud (GCP) and it works ok for short audiofiles (up to 2 to 3 minutes). However i want to work either with larger files or with live transcription. So here are my two questions:

Are there functions within huggingface, which I might have overlooked, that simplify working with larger audiofiles? May be some kind of turning the audio into stream? Or automatically working with chunks?
similar question for live inference: is there any support within huggingface to use these large models for live transcriptions?
thanks

MLLife · April 21, 2023, 7:12am

looking for similar solution as well, anyone? @sanchit-gandhi

sanchit-gandhi · April 21, 2023, 6:56pm

Here you go! See openai/whisper-large-v2 · Hugging Face and Google Colab

Topic		Replies	Views
Hugging face model not transcribing the entire length of the audio file Beginners	0	515	August 7, 2023
Duration of audio sequence ingested by Whisper Inference Endpoints on the Hub	2	1674	January 17, 2023
How to use Inference API to perform speech recognition Beginners	1	209	October 12, 2024
ASR on inference endpoints Intermediate	1	380	February 11, 2024
HuggingFace Inference endpoint 504 error Inference Endpoints on the Hub	3	803	January 30, 2024

Support for ASR inference on longer audiofiles or on live transcription?

Related topics