Summarization - Pegasus - min_length

xchochu · November 9, 2020, 11:05am

Hello.

I am playing around with the minimal number of tokens in the generated output by “google/pegasus-cnn_dailymail”. Basically I follow the documentation so my code looks like this

    batch = tokPeg.prepare_seq2seq_batch(src_texts=[s]).to(torch_device) 
    gen = modelPeg.generate(**batch, 
                        num_beams=int(8), min_lenght=100)
    summary: List[str] = tokPeg.batch_decode(gen, skip_special_tokens=True)

However, when I count number of tokens in output text by len(tokPeg.tokenize(summary[0])) the output text produces fewer tokens than is specified in min_length. Is there anything I am missing?

rgwatwormhill · November 10, 2020, 11:54am

this might be a red herring, but your code snippet shows “min_lenght” where it should be “min_length”

Topic		Replies	Views
Questions about Pegasus for Summarization 🤗Transformers	1	787	August 24, 2020
Creating summaries of fixed length with PEGASUS model 🤗Transformers	1	474	July 13, 2022
Pegasus Questions 🤗Transformers	29	3944	July 5, 2021
Minimum number of tokens in generate Models	0	1067	March 10, 2023
T5 Generates very short summaries 🤗Transformers	22	5559	September 11, 2020

Summarization - Pegasus - min_length

Related topics