Get scores from Whisper using ASR pipeline

Bjornedt · September 6, 2023, 10:25pm

Hi!

Whisper’s method to obtain confidence scores differs from other transformer models.

From the whisper docs:

Whisper relies on accurate prediction of the timestamp tokens to determine the
amount to shift the model’s 30-second audio context window by, and inaccurate transcription in one window may negatively impact transcription in the subsequent windows.
We have developed a set of heuristics that help avoid failure cases of long-form transcription, which is applied in the results reported in sections 3.8 and 3.9. First, we use beam search with 5 beams using the log probability as the score function, to reduce repetition looping which happens more frequently in greedy decoding. We start with temperature 0, i.e. always selecting the tokens with the highest probability, and increase the temperature by 0.2 up to 1.0 when either the average log probability over the generated tokens is lower than −1 or the generated text has a gzip compression rate higher than 2.4.

When looking at the update method in the decoding.py of the whisper code you can see that the model computes log probabilities for each token and then ranks sequences based on their cumulative log probabilities.Crucially, these log probabilities provide sequence-level confidences, meaning they represent the confidence of entire sequences of tokens rather than individual tokens.

Hope this helps!

Topic		Replies	Views
Confidence Scores / Self-Training for Wav2Vec2 / CTC models Research	1	3749	April 21, 2022
Confidence Scores / Self-Training for Wav2Vec2 / CTC models With LM (PyCTCDecode) Research	1	2918	April 21, 2022
Confidence Score For Wav2Vec? 🤗Transformers	0	247	August 8, 2022
Log probabilities from openai whisper model corresponding to token/word speech to text task Beginners	2	829	June 25, 2024
Text generation pipeline - output_scores parameter Models	1	3956	January 20, 2021

Get scores from Whisper using ASR pipeline

Related topics