Can't load tokenizer

wildcard00 · April 2, 2024, 9:17pm

I’m trying to use this fine tuned Whisper model, but I’m getting the following error. What am I doing wrong?

OSError: Can't load tokenizer for 'gcasey2/whisper-large-v3-ko-en-v2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'gcasey2/whisper-large-v3-ko-en-v2' is the correct path to a directory containing all relevant files for a WhisperTokenizerFast tokenizer.

Here’s my code.

 from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline

    transcriber = pipeline("automatic-speech-recognition",
                           model="gcasey2/whisper-large-v3-ko-en-v2")
    out = transcriber(input_audio_file)
    print(out['text'])

Using this same code I can run inference another fine tuned Whisper model: byoussef/whisper-large-v2-Ko · Hugging Face

dblakely · April 6, 2024, 2:53pm

Hey @wildcard00, looks to me like they forgot to include the tokenizer. I’m looking here and there should be a tokenizer there, but they only included the model.

Some things you can try:

Leave a comment here asking them to upload the tokenizer and then wait
The tokenizer they used is probably the same as this, so you can try using that. Not sure how to do this elegantly using pipelines, but you could create your own checkpoint with the tokenizer files from OpenAI added along with the model files from gcasey2/whisper-large-v3-ko-en-v2. Could even upload this to your own Huggingface account

Topic		Replies	Views
ASR Model Tokenizer Won't Load 🤗Transformers	0	74	August 8, 2024
Fine tuning whisper for ASR Beginners	0	446	July 13, 2023
Can't load tokenizer after fine-tuning Beginners	1	1471	March 1, 2023
Can't load tokenizer for 'rukaiyaaaah/fine-tuned'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name Beginners	0	658	November 6, 2023
Tokenizer not created when training whisper-small model Beginners	5	1200	November 10, 2024

Can't load tokenizer

Related topics