_find_timestamp_sequence algorithm used in Whisper Pipeline

So I read through the current implementation of the Whisper ASR Pipeline because I was very impressed by the results. In the code, there is a function used to properly merge two overlappings transcription with it’s timestamps using the logits, not the decoded tokens. They use longest common subsequence for this task, which is usually not too bad, but imo wrong here. Why not use longest common substring? It is way more accurate as whisper is unlikely to add tokens in the middle in one of the chunks. Only the first and last word that occurs in both transcripts should differ if for example the speech is truncated in the middle of a word. Therefore, I would suggest using longest common substring insteaf of longest common subsequence which is way more error prone.