I ve finetuned some ASR models (Whisper and XLSR/wave2vec), and I want to use them now for inference. I ve deployed an API on the cloud (GCP) and it works ok for short audiofiles (up to 2 to 3 minutes). However i want to work either with larger files or with live transcription. So here are my two questions:
- Are there functions within huggingface, which I might have overlooked, that simplify working with larger audiofiles? May be some kind of turning the audio into stream? Or automatically working with chunks?
- similar question for live inference: is there any support within huggingface to use these large models for live transcriptions?