Hello, I need to fine tune a whisper model. When I looked for forums or etc. they are keep telling that I should use Hugginface to create an audio file. Do I have to use it firstly ? Secondly, for openai/whisper are we using huggingface in the background ? What is the benefits of it ? If I don’t want to use it can I use opeanAI/whisper ?
Hi,
OpenAI initially open-sourced Whisper at GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision. This is only a PyTorch implementation, unrelated to Hugging Face. It is meant for inference-only.
Hugging Face has implemented Whisper in the Transformers library, with various additional features, most notably
- ability to fine-tune on custom data: Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers
- ability to speed up transcription massively: GitHub - Vaibhavs10/insanely-fast-whisper
- optimize it for production deployment: https://twitter.com/IlysMoutawwakil/status/1667258837194383360
- distil Whisper into a smaller student model that is equivalent in terms of performance: GitHub - huggingface/distil-whisper: Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
- etc.