Can't load tokenizer

I’m trying to use this fine tuned Whisper model, but I’m getting the following error. What am I doing wrong?

OSError: Can't load tokenizer for 'gcasey2/whisper-large-v3-ko-en-v2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gcasey2/whisper-large-v3-ko-en-v2' is the correct path to a directory containing all relevant files for a WhisperTokenizerFast tokenizer.

Here’s my code.

 from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline

    transcriber = pipeline("automatic-speech-recognition",
                           model="gcasey2/whisper-large-v3-ko-en-v2")
    out = transcriber(input_audio_file)
    print(out['text'])

Using this same code I can run inference another fine tuned Whisper model: byoussef/whisper-large-v2-Ko · Hugging Face

Hey @wildcard00, looks to me like they forgot to include the tokenizer. I’m looking here and there should be a tokenizer there, but they only included the model.

Some things you can try:

  • Leave a comment here asking them to upload the tokenizer and then wait
  • The tokenizer they used is probably the same as this, so you can try using that. Not sure how to do this elegantly using pipelines, but you could create your own checkpoint with the tokenizer files from OpenAI added along with the model files from gcasey2/whisper-large-v3-ko-en-v2. Could even upload this to your own Huggingface account