How do you use Beam Search in Whisper correctly?

na50r · December 15, 2024, 9:53am

*I accidentally deleted my old post, I misclicked on the trash bin and since there was no: ‘Do you really want to delete this post?’ popup, it irreversibly deleted it. Apologies for duplicate, original is not visible anymore.

I’m trying to evaluate Whisper on a dataset and ran the Beam search version with:

model.transcribe(path_to_audio_file, 	beam_size=5, 
			patience=1, 
			temperature=0, 
			length_penalty=0.6, 
			language='en')

Yet, the Greedy variant, which is basically the same code only with temperature and language set and nothing else, it uses Greedy by default, performed better.

Am I missing a different hyperparameter or is there a reason why Greedy performs better? I did go over the code but to my understanding, if temeprature=0 and beam_size is greater than 1, it will select the BeamSearch decoder and should just work…

Alanturner2 · December 15, 2024, 1:34pm

The discrepancy between Greedy Search and Beam Search performance often stems from hyperparameter tuning or dataset characteristics. Beam Search can outperform Greedy Search, but it requires careful adjustment of parameters like:

Beam Size: Experiment with different values (e.g., 3, 5, 10). Too high can overfit unlikely paths.
Length Penalty: A value of 0.6 might favor shorter sequences. Try increasing it closer to 1.0 for longer outputs.
Temperature: While temperature=0 is deterministic, a slight increase (e.g., 0.3) can improve diversity in Beam Search.
Patience: Test slightly higher values (e.g., 1.5) to explore more hypotheses.

Additionally, the dataset’s clarity and length can impact results. If your dataset has short, unambiguous utterances, Greedy Search might naturally perform better. Verify that language settings (language='en') are consistent and check if evaluation metrics align with your decoding objectives.

Beam Search may also suffer from length bias or suboptimal scoring. Inspect intermediate hypotheses to identify potential issues. By tuning the parameters and aligning with your dataset, Beam Search performance should improve.

na50r · December 15, 2024, 1:37pm

But in the source code, it says:

 for t in temperatures:
            kwargs = {**decode_options}
            if t > 0:
                # disable beam_size and patience when t > 0
                kwargs.pop("beam_size", None)
                kwargs.pop("patience", None)
            else:
                # disable best_of when t == 0
                kwargs.pop("best_of", None)

So I thought if temperature isn’t 0, it won’t even use beam search.

Alanturner2 · December 15, 2024, 1:42pm

You’re correct that the behavior in the source code snippet indicates Beam Search is only utilized when temperature=0. If the temperature is set above 0, the code explicitly disables beam_size and patience, meaning Beam Search is effectively turned off, and a different decoding strategy is used instead.

Beam Search relies on deterministic token selection, which requires temperature=0 to ensure consistency and make meaningful comparisons among hypotheses in the beam.
When temperature > 0, the model introduces randomness into token selection, which is incompatible with the deterministic nature of Beam Search. As a result, the code switches to sampling-based decoding.

So i recommend you to ensure Beam Search is applied, always set temperature=0 and specify the beam_size. If you want to experiment with randomness, you’ll need to use sampling-based methods instead. It’s worth double-checking your code to ensure temperature=0 is being enforced when using Beam Search.

Topic		Replies	Views
Whisper decoder is slow for ASR task 🤗Transformers	3	1928	November 26, 2023
Is beam search always better than greedy search? Beginners	3	5072	May 24, 2022
Need advice for implementing Greedy Search for ORTModelForSeq2SeqLM 🤗Optimum	2	595	January 17, 2024
Beam search error 🤗Transformers	2	564	May 12, 2024
How to find the beam search score for any target output? (BartForConditionalGeneration) 🤗Transformers	0	1437	March 22, 2022

How do you use Beam Search in Whisper correctly?

Related topics