I am exploring different summarization models for news articles and am struggling to work out how to limit the number of sentences and the number of characters per sentence using pipelines, or if this is even possible/a silly question to begin with!
I have the following setup when being passed through the article text and model name of ‘facebook/bart-large-cnn’, ‘google/pegasus-cnn_dailymail’ and ‘sshleifer/distilbart-cnn-6-6’:
summarizer = pipeline(“summarization”, model=model_name)
summarized = summarizer(article_text, max_length=118, clean_up_tokenization_spaces=True, truncation = True)
The articles range in length from 100 words to 1000 words.
I am hoping to limit the number of sentences to three and, more importantly, cap the number of characters per sentence to 118, a hard cap for my application. When I set max_length to 118 they usually are below this limit but can be, say, 220 characters or sometimes just truncate off at the end.
Would be wonderful if someone could let me know what I’m doing wrong!