Split document into sentences for sentence embedding


Wondering if there is a Huggingface alternative to the Gensim split_sentences method to take a document and split into sentences ready for model.encode()?

A first timer says many thanks :slight_smile:

So you want to split a text into sentences and then create a sentence embedding for each sentence? Just use a parser like stanza or spacy to tokenize/sentence segment your data. This is typically the first step in many NLP tasks.

1 Like

Indeed, just wondered if Huggingface had their own variant. Thanks