Is there a way to split a news article into subtopic

Hello, is there a way I can perform text segmentation on news articles?
For example, a news article usually contains the main topic, but when reading through, there might probably be some subtopics present in the article. Is there a way I can divide those articles into those subsections/subtopics so that a news article can contain 2,3 or more sections depending on the subtopics discussed in that particular article.

In case you are curious about what I need this for, I’m performing summarization on news articles, so instead of summarizing or parsing the whole article into the model at once, I want to divide them into sections based on what is discussed in the article and then summarize each section. Basically I’m trying to imitate what is done at summari.com

I will appreciate it if someone has done something like this before, or if anybody knows a way I can work through it.

I’d recommend looking into BERTopic

1 Like

Thanks for your response, I checked it out and it is not addressing what I’m trying to do.

BertTopic is kind of grouping multiple articles into various topics based on how frequently some words appear there.

But what I’m trying to do is that given a single article I want to be able to divide that article into sections/subtopics if any is present.

You could break the article into paragraphs and run it through BERTopic

1 Like

Wow, I will try this out. Thank you.