No output from ASR Pipeline using Whisper

Hello all,

I am attempting to using Whisper to transcribe long-form audio recordings using pipeline(). I followed the brief tutorial listed on the openai/whisper-small.en model page under the “Long-form Transcription” section.

When I run the pipeline, it returns an output of {text: ''} without any other error message or context. I have tried passing the path to the file (str) and the loaded audio’s waveform (array) and get the same result.

I also tried loading a model and processor, calling model.generate() and then decoding and the output comes back as an empty string like above.

I’m not sure what is happening that is causing it to output an empty string. I have verified these audio files work as I have successfully transcribed them using OpenAI’s Whisper library.

This is the code I am running:

DEVICE = '0' if torch.cuda.is_available() else -1
file_path = 'path/to/audio_file.wav'
model_path = '/path/to/model_dir/.' # contains all the same files in the openai/whisper-small.en repo

model = tf.WhisperForConditionalGeneration.from_pretrained(
   pretrained_model_name_or_path=model_path
   )
tokenizer = tf.WhisperTokenizerFast.from_pretrained(
   pretrained_model_name_or_path=model_path
   )
extractor = tf.WhisperFeatureExtractor.from_pretrained(
   pretrained_model_name_or_path=model_path
   )

pipe = pipeline(
   task='automatic-speech-recognition',
   model=model,
   tokenizer=tokenizer,
   extractor=extractor,
   device=DEVICE
   )

result = pipe(file_path)
print(result)

Output:

{text: ''}

I resolved the issue, turns out there was an issue with one of the downloaded config files.