GPT2 summarization performance

bpraveenk · October 30, 2021, 5:03pm

Has anyone run benchmark studies to evaluate the generation/summarization performance of GPT2 on datasets such as “xsum” ? If so could you share the performance numbers (in-terms of ROUGE scores) you got? I search for these results online, but couldn’t find any.

Kirili4ik · October 31, 2021, 11:16am

Hi,

I can suggest starting looking here:

I haven’t found neither to be honest. But!

As I believe it seems smarter to use encoder-decoder style models (like PEGASUS or BART) for summarization. Decoder-only language models like GPT were not only trained to continue texts (unlike ~BART), but also they don’t extract the idea of the given texts. Though encoder can be interpreted as an “idea extractor” and the decoder as the generator for natural language text.

P.S. I know one paper that tries to prove the point on summarizing using GPT3 is better than using BART. Even though it heavily relies on Russian language in experiments - you can use the references in the paper to look deeper and find what you are looking for. Paper arxiv link:

Good luck and let me know if you find anything,
Kirill

bpraveenk · November 1, 2021, 8:26pm

Thank you Kirill, for sharing the pointers. I agree with you that BART and PEGASUS are better for text summarization, over decoder only models. However, I was curious if some one had experimented with GPT2 variants for text generation. I found some sample implementations online, but no metrics on the performance evaluation on standard datasets. I feel it is also not straightforward to run inference (e.g., summaries) for text generation using GPT2. Some caveats such as penalizing long summaries, using special tokens to use a decoder only model for training & inference and not-immediately-obvious decoding strategies make inference tricky IMO. I ran a few tests and found the performance to be way below-par, contrary to some claims made in papers about the obviousness of achieving performance improvement on supervised tasks using GPT2 style models, esp., for text generation. Nevertheless, I will keep looking and update this thread if I find any relevant articles or if I find a robust way of doing summary generation using GPT2.

amka66 · July 17, 2022, 8:46am

Any updates on that?

Do we need to conclude that decoder-only models are not suited for text summarization?

And how does it apply to GPT-3 – the latter seems to summarize quite well – do we expect a 175B parameter seq2seq model to perform considerably better?

Topic		Replies	Views
Summarization taks, looking for clarifications before getting started Beginners	10	973	February 16, 2021
Clarification regarding summarization task using LLAMA or GPT2 (Medium,Small) Spaces	0	46	November 23, 2024
Best model to use for Abstract Summarization Beginners	1	1480	January 4, 2022
BART with custom encoder and decoder Models	5	921	May 25, 2023
Sentences in Abstractive Summarization Beginners	1	492	March 4, 2021

GPT2 summarization performance

Related topics