Whisper Message: Special tokens have been added in the vocabulary

flabbaf97 · March 28, 2024, 1:15pm

I get this error message when I try to transcribe a 1 minutes voice. here are the codes:

model_id = ""openai/whisper-small""
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
  model = AutoModelForSpeechSeq2Seq.from_pretrained(
      model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
  )
  model.to(device)

  processor = AutoProcessor.from_pretrained(model_id)

  pipe = pipeline(
      "automatic-speech-recognition",
      model=model,
      tokenizer=processor.tokenizer,
      feature_extractor=processor.feature_extractor,
      max_new_tokens=128,
      chunk_length_s=30,
      batch_size=16,
      return_timestamps=True,
      torch_dtype=torch_dtype,
      device=device,
  )
pipe(audio_file, return_timestamps=True, generate_kwargs={"language": "German"})

Is this important? What can I do about this message? What does this mean and why happens?
Thank you for your answers and discussions.

Topic		Replies	Views
Whisper warning about not predicting end of a timestamp 🤗Transformers	1	1491	June 20, 2025
Whisper fine tuning 🤗Transformers	0	429	January 18, 2024
ORTModelForSpeechSeq2Seq load Openai/whisper-large-v3 failed Models	1	56	January 16, 2025
Finetuned whisper model translating instead of transcribing 🤗Transformers	2	734	December 31, 2023
Performing Whisper's "transcribe" with Transformer pipelines Beginners	2	2679	December 19, 2023

Whisper Message: Special tokens have been added in the vocabulary

Related topics