_find_timestamp_sequence algorithm used in Whisper Pipeline

LuckiestOne · March 24, 2023, 4:24pm

So I read through the current implementation of the Whisper ASR Pipeline because I was very impressed by the results. In the code, there is a function used to properly merge two overlappings transcription with it’s timestamps using the logits, not the decoded tokens. They use longest common subsequence for this task, which is usually not too bad, but imo wrong here. Why not use longest common substring? It is way more accurate as whisper is unlikely to add tokens in the middle in one of the chunks. Only the first and last word that occurs in both transcripts should differ if for example the speech is truncated in the middle of a word. Therefore, I would suggest using longest common substring insteaf of longest common subsequence which is way more error prone.

Topic		Replies	Views
Whisper warning about not predicting end of a timestamp 🤗Transformers	1	1489	June 20, 2025
Deploy whisper by passing last transcribed sentences to decoder's past_key values 🤗Transformers	0	292	March 20, 2023
Whisper pipeline return_timestamps error Beginners	0	1524	March 4, 2023
Whisper fine tuning on custom audio data Beginners	4	2718	February 15, 2025
Help about Whisper chunk_length Beginners	1	166	February 15, 2025

_find_timestamp_sequence algorithm used in Whisper Pipeline

Related topics