Generating Abstractive summaries

I have tried a few models for Abstractive summaries, but neither are producing a reasonable form of a summary, which I define as a single, short paragraph. You can see the various models used so far in the code. The one in use in the code below is facebook/bart-large, which is producing a summary, longer than the original article.

Original Article: 485 words
Summary: 685 words

I would appreciate your help with model selection or alternate approach to summarization.

import torch
from transformers import pipeline

file = open("Data/article.txt", "r")
article = file.read()

#  model_name = 'google/pegasus-newsroom'
#  model_name = 'Artifact-AI/led_base_16384_billsum_summarization'

model_name = 'facebook/bart-large'

summarizer = pipeline('summarization', model=model_name,
                      max_new_tokens=1024,
                      truncation=True, framework='pt')
summary = summarizer(article)
print(summary)

Try using a smaller max_new_tokens.

I tried various models and found some interesting results.
Model ‘sshleifer/distilbart-cnn-12-6’ is the default summarization model when using PyTorch. See:

Pipelines.

The default model, using pyTorch, can be used as follows:

from transformers import pipeline

file = open("Data/article.txt", "r")
article = file.read()

summarizer = pipeline('summarization')
summary = summarizer(article, max_length=130, min_length=30, do_sample=False)
print(summary)

I also could use ‘facebook/bart-large-cnn’, which is another good Abstractive summarization model with similar performance. Using this model, you need to pass-in the model name to the pipeline using ‘model=’ parameter.

Both models result in good summaries with reasonable length and in about 30s on my PC which has no GPU.

Results follow: