I am using a test article as input and trying to generate a reasonably long summary, but I keep getting very short, single line, summaries from a very long article. I would appreciate any help.
from transformers import AutoTokenizer, \
file = open("Data/article.txt", "r")
article = file.read()
model_name = 'google/pegasus-xsum'
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer(article, max_length=510, truncation=True, return_tensors='pt').input_ids
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
outputs = model.generate(inputs, max_new_tokens=510, do_sample=False)
summary = tokenizer.decode(outputs, min_length=50, max_length=200, skip_special_tokens=True)
the Input Article:
With Wednesday’s CNN town hall behind him, former President Donald Trump remains both the prohibitive GOP front-runner for the 2024 nomination and a man who was found liable this week in a civil case for sexually abusing and defaming former magazine columnist E. Jean Carroll. While we cannot yet know what effect that verdict will have on the race for the Republican nomination, Trump built his large polling advantage with this civil trial in the news and after being indicted earlier this spring in a separate criminal case related to hush money payments to Stormy Daniels. (Trump has denied all wrongdoing.) Trump’s edge in surveys of Republican voters and in endorsements from elected officials at this stage is among the strongest for a nonincumbent in the modern presidential primary era. Trump is polling, on average, north of 50% in national polls of likely GOP primary voters. His nearest potential challenger – Florida Gov. Ron DeSantis, who has yet to launch a campaign – is earning a little north of 20% of the Republican primary vote on average. No other potential Republican candidate is in double digits. There are very few candidates, from either party, in nonincumbent races who were near or north of 50% in the national primary polls this early on. That select few includes Republicans Bob Dole in 1996 and George W. Bush in 2000, and Democrats Al Gore in 2000 and Hillary Clinton in 2016. All of those candidates won their party’s nominations, and none of those races were particularly close. Interestingly, all the legal controversies involving Trump have not hurt him in the polls. At the beginning of the year, Trump was earning a little more than 40% of the vote, on average, and was only about 10 points ahead of DeSantis. Trump’s lead is now triple that, at closer to 30 points, on average. The advantages that DeSantis once had have similarly melted away. For example, in New Hampshire – where Wednesday’s town hall took place – DeSantis was up by 12 points over Trump at the beginning of the year, according to University of New Hampshire polling. Trump has now opened up a 20-point advantage in the latest UNH survey among likely GOP primary voters. Of course, if it was solely the polls where Trump was ahead, that might be one thing. Polling isn’t always predictive, and there could be some underlying dynamic that could shift it. But the other big factor working in Trump’s favor is that many elected Republicans are behind him. Trump has the endorsements of well more than 60 GOP governors and members of Congress. None of the Republicans running or thinking of running even come close to double digits in endorsements from elected officials of that level. The importance of endorsements should not be understated. The party apparatus uniting behind one candidate can often be key in turning a race around. On the Democratic side in 2020, for example, Joe Biden was lifted when his former rivals endorsed him ahead of Super Tuesday. The failure of the GOP establishment to back a single Trump alternative in 2016 was part of the reason he was able to win the nomination. Elected Republican officials were split on whom they wanted to be the nominee, and Trump was able to take advantage of this divided field. This time around, that option of the establishment rallying to a non-Trump candidate isn’t available to the same degree. Since 1980, anyone who had a similar share of endorsements as Trump does at this point in the nomination process has gone on to win their party’s nomination. What might it take for Trump to lose his front-runner status? The 1980 Democratic presidential primary might give us a clue. It remains the only time in modern history in which a nonincumbent (Sen. Ted Kennedy) was polling north of 45% at this point and didn’t go on to be his party’s nominee. Kennedy’s bid was overtaken by events on the ground. Specifically, then-President Jimmy Carter rallied back to win the Democratic nomination following the beginning of the Iran hostage crisis. A once clear lead for Kennedy became a large Carter advantage basically overnight. The crucial difference here is that there isn’t an incumbent in the 2024 Republican race. Still, there are a lot of potential events involving Trump. We don’t know how Republicans will react to his loss in civil court, which Trump has said he will appeal. We don’t know what will happen if he is convicted in a possible criminal trial related to the hush money payments. (After being charged by a Manhattan grand jury with 34 felony counts of falsifying business records in that case, he’s trying to move it to federal court.) We don’t know what will happen if he is indicted in other cases around the country related to his efforts to overturn the 2020 election results, his conduct in the run-up to the January 6, 2021, attack on the US Capitol, and his handling of classified documents. Perhaps the biggest unknown is how Republican voters will react if DeSantis actually gets into the race. It’s possible voters will look at him differently if and when he truly starts campaigning. For now, though, Trump remains well ahead. It will take something big to knock him off his perch at the top of the Republican polls. This story has been updated.
Apparently models ending in ‘xsum’ are designed for a single-line summary. I will modify the test and report.
I was testing to see if I could find anything. You can add length_penalty=3 to force it to generate longer answer but it starts to keep repeating itself. You can also add repetition_penalty=1.5 to reduce the amount of repetition. But the result isn’t a very good summary.
edit: yea, much better result when I switched to google/pegasus-x-large
@wsfung2008 Thank you for looking into this question. Apparently all models ending in ‘xsum’ are designed for a single line summary, so this was my fault for not understanding the model’s purpose.
However I did run into a problem, using models designed for longer
Abstractive summaries and posted another issue. Perhaps if you look at that one, you may come up with a solution.
see my post titled “
Generating Abstractive summaries”