Text Comparison of Transcript and written Text

Bewq · July 16, 2024, 1:41pm

Hey!

I am trying to find the right system, model, idea to compare a written down speech with the transcript of said speech. (compare output transcript to reference)

I’ve been working with basics such as difflib and fuzz, but the issue is that I am not getting it to stick with wordgroups or sentences, as the speech is structured with linebreaks and the transcript’s text isn’t. On top, the freely spoken Speech didn’t stick as closely to the originally written down part. This issue, together with plenty of missunderstood words is a nightmare for me to figure out.

I’ve been trying a sliding window approach, but somehow Fuzz keeps adding words that are nowhere near the actual part, as I fail to implement the sequential nature of words.

My goal is to keep the speeches’ structure with lines and add the matching transcript sentences below it with a ratio that shows me how close it is.

My initial appraoch works fine to a certain degree, but fails if close sentences are too similiar.

Does anyone have an idea on how to solve this issue or could lead me the direction for this?

Cheers

Topic		Replies	Views
Don't know where to start. Please help manipulating transcribed audio Beginners	0	203	March 11, 2024
Model Suggestion on Text correction Beginners	0	764	April 2, 2021
Text to Speech Alignment with Transformers Research	2	5504	April 20, 2022
Formatting text in a standard / formal way Beginners	0	14	July 31, 2024
Searching Keywords by relatively long text Beginners	1	675	December 5, 2024

Text Comparison of Transcript and written Text

Related topics