Pegasus Questions

sshleifer · November 3, 2020, 1:04pm

Yes. and PegasusTokenizer.from_pretrained('google/pegasus-arxiv') should truncate for you.

DeathTruck · November 9, 2020, 8:11pm

About 1-5% of the time I run Pegasus (’‘google/pegasus-xsum’) I get repeating strings of various length. I’ve seen this before on other seq2seq models using LSTMs, but is there a way to avoid this behavior? I assume I’m not the first to notice this with Pegasus.

sshleifer · November 10, 2020, 12:40am

You’re the first one to report it! Could we see and example input output?

DeathTruck · November 10, 2020, 4:07am

Maybe I’m doing something wrong then. Here are a couple of examples:
Let me know if you get somethinng different.

text_input:
jobs on hold. i do apologize about the wait. how can i help out. i think i figured it out now. i was trying to make a report of all jobs on hold. but thank you. happy wednesday'''

Pegasus output:
jobs on hold, jobs on hold, jobs on hold, jobs on hold, jobs on hold, jobs on hold, jobs on hold, jobs on hold, jobs on hold, jobs on

text_input:
invoice issue. how can i help you today. why do i see balance. next to bill to. got ya. you seeing that balance because the job that payment is tied to is completed yet. once the job is complete the balance will go down to. have a good one. you as well

Pegasus output:
how can i help you with an invoice issue, how can i help you with an invoice issue, how can i help you with an invoice issue, how can i help you with an invoice

text_input:
quickbooks desktop integration. how may i help you today. yes i need help. may i know what is occurring today. sorry going to have to chat back

Pegasus output:
quickbooks desktop integration - how may i help you today.,,,,,,,,,,,,,,,,,,,,,,,,,,```

sshleifer · November 10, 2020, 2:09pm

try passing min_length=0 to generate

DeathTruck · November 10, 2020, 8:00pm

Thanks, that definitely helped minimize the cases. I usually have it set to 12-15 because it usually gives better and more meaningful summaries. I guess the tradeoff is having the model produce the repeated strings.

isollid · November 13, 2020, 4:29pm

Hi,

What does google/pegasus-pubmed do if you feed it an input longer than 1024 words like a biomedical article?

Aliisa · April 15, 2021, 7:43pm

Thanks! I added min_length=0 to the tokenizer.batch_decode(…, min_length=0). Is this the right parameter to be setting? It doesn’t seem to solve the recurrent answer I see. I’m wondering where else I can investigate (relatively new to customizing models…) Thank you for your help!

ex. Pegasus Summary
[“Today we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about infrastructure, we’re talking about”]

silvia-casola · June 15, 2021, 12:20pm

I’m trying to exactly understand the differences among the pre-trained Pegasus models.

As far as I understood:

models like google/pegasus-* (e.g google/pegasus-xsum) are base models
all base models are fine-tuned on a dataset (e.g. xsum in the previous example)
google/pegasus-large is only pretrained (on C4 and Newsroom?)
sshleifer/distill-pegasus-* and sshleifer/student_pegasus-* are distilled models
google/bigbird-pegasus-large-* use the bigbird attention mechanism.

My questions are the following:

Is my understanding correct?
Is there any way to get a base Pegasus which is not fine-tuned on a downstream dataset?
Is google/pegasus-multi_news multilingual?
As for the distilled models, what is the difference between a distill-* and a student-* model? And what do the two numbers represent (e.g. in sshleifer/distill-pegasus-cnn-16-4)?

Thank you very much and thanks for all the work.

Themis · July 5, 2021, 10:46am

Hello, my question is, can we control the length of the output summary? is there any parameter that controls the length? Can we produce summaries with length larger than the max_length parameter?
Currenntry im using ‘google/pegasus-multi_news’

Thanks is advance

Topic		Replies	Views
Questions about Pegasus for Summarization 🤗Transformers	1	788	August 24, 2020
Creating summaries of fixed length with PEGASUS model 🤗Transformers	1	479	July 13, 2022
Finetuning Pegasus for summarization task 🤗Transformers	3	1055	October 14, 2020
Summarization - Pegasus - min_length Beginners	1	508	November 10, 2020
How to utilize a summarization model Beginners	4	2418	February 18, 2021

Pegasus Questions

Related topics