Hi,
Wondering if there is a Huggingface alternative to the Gensim split_sentences method to take a document and split into sentences ready for model.encode()?
A first timer says many thanks
Hi,
Wondering if there is a Huggingface alternative to the Gensim split_sentences method to take a document and split into sentences ready for model.encode()?
A first timer says many thanks
So you want to split a text into sentences and then create a sentence embedding for each sentence? Just use a parser like stanza or spacy to tokenize/sentence segment your data. This is typically the first step in many NLP tasks.
Indeed, just wondered if Huggingface had their own variant. Thanks