Why does PEGASUS generate summaries with tags?
Here is how a have initialized the model and generate summaries:
from transformers import PegasusForConditionalGeneration, PegasusTokenizerFast, PegasusConfig
import torch torch_device = 'cuda' if torch.cuda.is_available() else 'cpu' pegasus_model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-pubmed').to(torch_device) pegasus_tokenizer = PegasusTokenizerFast.from_pretrained('google/pegasus-pubmed', max_position_embeddings=2048) def pegasus_summarization(article): batch = pegasus_tokenizer.prepare_seq2seq_batch([article], truncation=True, padding='longest', max_target_length=250, return_tensors='pt').to(torch_device) translated = pegasus_model.generate(**batch) tgt_text = pegasus_tokenizer.batch_decode(translated, skip_special_tokens=True) return tgt_text
And here is the resulting summary:
anxiety is the most prominent and prevalent mood disorder in parkinson’s disease ( pd ) ; however, little is known about the relationship between anxiety and cognition in pd. <n> the aim of this study was to examine the influence of anxiety on cognition in pd by directly comparing groups of pd patients with and without anxiety while excluding depression. <n> we hypothesized that pd patients with anxiety would show impairments in attentional set - shifting and working memory compared to pd patients without anxiety.
I used pegasus in October last year, but was not a problem then. Maybe it is something that came with the v4.0.0 release of transformers?
I found others that have experienced the same (text2slide/pegasus.py at 8af85b423f68b399b88292c8a08c2cbf5a744ea1 · eeic-ai-01/text2slide · GitHub) ref the regex substitute of <n>-tags.
Appreciate all answers!