LM few shot and fine tuning on summarization task

hello there,

i am new to the forum and to nlp in general. i start this topic to try to understand more about language models and how huggingface can be used for few shot learning and fine tuning.

i am interested in the text summarization task. i know there are already some pre trained models such as BART, T5 and Pegasus that perform summarization quite well and i have already played with them using the huggingface transformers library. i know there are some community notebooks such as this and that, and the GitHub issue #4406 (could not link it since i am new) on how to fine tune these models for more specific summarization tasks that differ from the common CNN/dailymail corpus.

what i am here to understand more is how to use a generic LM for text summarization. T5 and BART have a ForConditionalGeneration class. however, models like BERT, flauBERT, gpt, gpt2, XLM do not have this class, but only a LM head. i have read the gpt3 LM can perform any given task by just “looking” at some examples. i am wondering if there is a way to do the same with the LM i have just cited. moreover, most of the summarization discussions are focused on BART and T5 and i could not find any guide on how to actually fine tune generic LM models (BERT, flauBERT, gpt, gpt2, XLM, etc.) on such a task.

i have used gpt2 for text summarization by feeding the article and then the string “TL;DR:”, but the results are quite bad.

:sweat_smile: TL;DR: in synthesis, my questions are: how can i do few shot learning on summarization with LM models such as gpt2 that only have a LM head class, using huggingface? how can i properly fine tune them on such a task?

P.S. the huggingface transformers library is amazing, i hope the community will keep on thriving!