How to set language in Whisper pipeline for audio transcription?

I want to use speech transcription with openai/whisper-medium model using pipeline

But I need to get the specified language in the output

I tried generate_kwargs=dict(forced_decoder_ids=forced_decoder_ids,) where forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="transcribe") But the output is just an empty response: {'text': '', 'chunks': []}

Is there a way to set the language?


You can’t AFAIK, but you can get similar, if not identical results this way:

import librosa

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import Audio, load_dataset

MAX_INPUT_LENGTH = 16000 * 30

# load model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="transcribe")

# load audio sample
sample, sr = librosa.load("audio.WAV", sr=16000)
sample_batch = [sample[i:i + MAX_INPUT_LENGTH] for i in range(0, len(sample), MAX_INPUT_LENGTH)]
input_features = processor(sample_batch, sampling_rate=sr, return_tensors="pt").input_features

# generate token ids
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)


I have successfully managed to use whisper with a pipeline, on a specific language/task - therefore taking advantage of the smart chunking algorithm presented in this blog post.

My code is very similar to yours except that I don’t use WhisperProcessor. Instead, I declare the WhisperTokenizer and WhisperFeatureExtractor separately :

from transformers import WhisperForConditionalGeneration
from transformers import WhisperFeatureExtractor
from transformers import WhisperTokenizer
from transformers import pipeline

feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-medium")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-medium", language="french", task="transcribe")

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
forced_decoder_ids = tokenizer.get_decoder_prompt_ids(language="french", task="transcribe")

asr_pipe = pipeline(
    stride_length_s=(4, 2)

Then you can use generate_kwargs as follows :

        generate_kwargs={"forced_decoder_ids": forced_decoder_ids}

Hope this helps !

1 Like