Split document into sentences for sentence embedding

Hi,

Wondering if there is a Huggingface alternative to the Gensim split_sentences method to take a document and split into sentences ready for model.encode()?

https://www.kite.com/python/docs/gensim.summarization.textcleaner.split_sentences

A first timer says many thanks :slight_smile:

1 Like

So you want to split a text into sentences and then create a sentence embedding for each sentence? Just use a parser like stanza or spacy to tokenize/sentence segment your data. This is typically the first step in many NLP tasks.

1 Like

Indeed, just wondered if Huggingface had their own variant. Thanks