Hi, I want to develop an auto chaptering feature like Youtube or AssemblyAI.
But I could not find a blog post or an article about it.
My naive approach is something like below.
-
split entire text int chunks.
eg. split 2000 words text into 40 chunks. 1 chunk has 50 words. -
use sentence-transformers to get embeddings.
-
if n index of chunk is significantly different from n-1, I can think topic change.
-
summarize chunks to create chapter summary.
eg 1st chunk is 0 to 15, 2nd chunk is 16 to 35, 3rd chunk is 36 to 50.
I’m pretty sure there is a way more sophisticated way.
Please give me a hint if someone know the proper methods.
Thanks in advance.