Why does PEGASUS generate summaries with tags?
Here is how a have initialized the model and generate summaries:
from transformers import PegasusForConditionalGeneration, PegasusTokenizerFast, PegasusConfig
import torch
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
pegasus_model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed').to(torch_device)
pegasus_tokenizer = PegasusTokenizerFast.from_pretrained('google/pegasus-pubmed', max_position_embeddings=2048)
def pegasus_summarization(article):
batch = pegasus_tokenizer.prepare_seq2seq_batch([article], truncation=True, padding='longest', max_target_length=250, return_tensors='pt').to(torch_device)
translated = pegasus_model.generate(**batch)
tgt_text = pegasus_tokenizer.batch_decode(translated, skip_special_tokens=True)
return tgt_text[0]
And here is the resulting summary:
anxiety is the most prominent and prevalent mood disorder in parkinson’s disease ( pd ) ; however, little is known about the relationship between anxiety and cognition in pd. <n> the aim of this study was to examine the influence of anxiety on cognition in pd by directly comparing groups of pd patients with and without anxiety while excluding depression. <n> we hypothesized that pd patients with anxiety would show impairments in attentional set - shifting and working memory compared to pd patients without anxiety.
I used pegasus in October last year, but was not a problem then. Maybe it is something that came with the v4.0.0 release of transformers?
I found others that have experienced the same (https://github.com/eeic-ai-01/text2slide/blob/8af85b423f68b399b88292c8a08c2cbf5a744ea1/summarization/abstractive/summarizer/pegasus.py) ref the regex substitute of <n>-tags.
Appreciate all answers!