Hey there, i’m looking for a way to package a finetuned whisper model and base model in just one pickled file, to upload to server and then load and keep it in gpu for quick inference.
Since my fine tuned (40Mb) model need a base whisper model 6.7Gib, take a while download all files, and depends on openAI/whisper-model repository for correct working, so if i want to deploy it, i need to make the service more solid, (avoid external dependencies).
I’d like somethings like whisper preset load way: whisper.load_model(‘my-model’), and then whisper.transcribe(‘audio.wav’)[‘text’]
For now the only way that i found should be down load base and finetunned files, in 2 folder, and make a pipeline with models, processor, feature extractor, and tokenizer.
Other things that i would improve, should be choose the best format for pickling or serializing this model file to make it light and quick, or delete weight trained of languages that the service, never going to use (Hebrew, Arabic, etc), but i understand that this could be difficult or impossible since the model base already was trained.