I followed the tutorial to train a whisper-small (in my case I used the whisper-base-en) model and I was able to successfully train the model.
However, after publishing to my hub and trying to read it through the Pipeline function:
pipe = pipeline("automatic-speech-recognition", model="beeezeee/whisper-base")
I get the following error:
OSError: Can’t load tokenizer for ‘beeezeee/whisper-base’. If you were trying to load it from ‘Models - Hugging Face’, make sure you don’t have a local directory with the same name. Otherwise, make sure ‘beeezeee/whisper-base’ is the correct path to a directory containing all relevant files for a WhisperTokenizerFast tokenizer.
I am not sure what the issue here is, but it seems like my trainer never created a Tokenizer file (but from what I read, ASR is different from your regular NLP models).
@sanchit-gandhi - I feel like I have seen your name quite often in this space on this website (I followed your tutorial as well and I got the same results - no Tokenizer from the training was created).
Here are the list of files my trainer produces:
Let me know if I had could provide more information. Thanks.