*I accidentally deleted my old post, I misclicked on the trash bin and since there was no: ‘Do you really want to delete this post?’ popup, it irreversibly deleted it. Apologies for duplicate, original is not visible anymore.
I’m trying to evaluate Whisper on a dataset and ran the Beam search version with:
Yet, the Greedy variant, which is basically the same code only with temperature and language set and nothing else, it uses Greedy by default, performed better.
Am I missing a different hyperparameter or is there a reason why Greedy performs better? I did go over the code but to my understanding, if temeprature=0 and beam_size is greater than 1, it will select the BeamSearch decoder and should just work…
The discrepancy between Greedy Search and Beam Search performance often stems from hyperparameter tuning or dataset characteristics. Beam Search can outperform Greedy Search, but it requires careful adjustment of parameters like:
Beam Size: Experiment with different values (e.g., 3, 5, 10). Too high can overfit unlikely paths.
Length Penalty: A value of 0.6 might favor shorter sequences. Try increasing it closer to 1.0 for longer outputs.
Temperature: While temperature=0 is deterministic, a slight increase (e.g., 0.3) can improve diversity in Beam Search.
Patience: Test slightly higher values (e.g., 1.5) to explore more hypotheses.
Additionally, the dataset’s clarity and length can impact results. If your dataset has short, unambiguous utterances, Greedy Search might naturally perform better. Verify that language settings (language='en') are consistent and check if evaluation metrics align with your decoding objectives.
Beam Search may also suffer from length bias or suboptimal scoring. Inspect intermediate hypotheses to identify potential issues. By tuning the parameters and aligning with your dataset, Beam Search performance should improve.
for t in temperatures:
kwargs = {**decode_options}
if t > 0:
# disable beam_size and patience when t > 0
kwargs.pop("beam_size", None)
kwargs.pop("patience", None)
else:
# disable best_of when t == 0
kwargs.pop("best_of", None)
So I thought if temperature isn’t 0, it won’t even use beam search.
You’re correct that the behavior in the source code snippet indicates Beam Search is only utilized when temperature=0. If the temperature is set above 0, the code explicitly disables beam_size and patience, meaning Beam Search is effectively turned off, and a different decoding strategy is used instead.
Beam Search relies on deterministic token selection, which requires temperature=0 to ensure consistency and make meaningful comparisons among hypotheses in the beam.
When temperature > 0, the model introduces randomness into token selection, which is incompatible with the deterministic nature of Beam Search. As a result, the code switches to sampling-based decoding.
So i recommend you to ensure Beam Search is applied, always set temperature=0 and specify the beam_size. If you want to experiment with randomness, you’ll need to use sampling-based methods instead. It’s worth double-checking your code to ensure temperature=0 is being enforced when using Beam Search.