I want to use speech transcription with openai/whisper-medium model using pipeline
But I need to get the specified language in the output
I tried generate_kwargs=dict(forced_decoder_ids=forced_decoder_ids,) where forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="transcribe") But the output is just an empty response: {'text': '', 'chunks': []}
I have successfully managed to use whisper with a pipeline, on a specific language/task - therefore taking advantage of the smart chunking algorithm presented in this blog post.
My code is very similar to yours except that I don’t use WhisperProcessor. Instead, I declare the WhisperTokenizer and WhisperFeatureExtractor separately :
from transformers import WhisperForConditionalGeneration
from transformers import WhisperFeatureExtractor
from transformers import WhisperTokenizer
from transformers import pipeline
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-medium")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-medium", language="french", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
forced_decoder_ids = tokenizer.get_decoder_prompt_ids(language="french", task="transcribe")
asr_pipe = pipeline(
"automatic-speech-recognition",
model=model,
feature_extractor=feature_extractor,
tokenizer=tokenizer,
chunk_length_s=30,
stride_length_s=(4, 2)
)