Hey!
I am trying to find the right system, model, idea to compare a written down speech with the transcript of said speech. (compare output transcript to reference)
I’ve been working with basics such as difflib and fuzz, but the issue is that I am not getting it to stick with wordgroups or sentences, as the speech is structured with linebreaks and the transcript’s text isn’t. On top, the freely spoken Speech didn’t stick as closely to the originally written down part. This issue, together with plenty of missunderstood words is a nightmare for me to figure out.
I’ve been trying a sliding window approach, but somehow Fuzz keeps adding words that are nowhere near the actual part, as I fail to implement the sequential nature of words.
My goal is to keep the speeches’ structure with lines and add the matching transcript sentences below it with a ratio that shows me how close it is.
My initial appraoch works fine to a certain degree, but fails if close sentences are too similiar.
Does anyone have an idea on how to solve this issue or could lead me the direction for this?
Cheers