Use Pegasus in Huggingface for a downstream classification task

ksm · October 10, 2022, 1:29pm

I have collected a dataset of paragraphs summaries, where the summary may or may not correspond to the paragraph it is paired with. I also have the labels of whether a summary corresponds to the paragraph or not (1 if it is a corresponding pair, and 0 if it is not).

I would like to use the pretrained Pegasus_large model in Huggingface (off-the-shelf) and train it on this downstream classification task.

Since Pegasus does not have any CLS token, I was thinking of possible ways of doing this.

I want to concatenate the paragraph and summary together, pass it through the pretrained Pegasus encoder only, and then pool over the final hidden layer outputs of the encoder. If I use the Huggingface PegasusModel (the one without and summary generation head), it expects me to provide decoder_input_ids, which I assume are the true tokens (labels) when pegasus is trained as a seq2seq model for summary generation. However, since I am not training my model to generate summaries, and would like the encoder representation only, I am not sure what to put as my decoder_input_ids.

My questions are: 1. Am I right in assuming the decoder_input_ids are only used for training the model for sequence generation, and 2. How should I get the last hidden layer outputs without having any decoder_input_ids in an encoder-decoder model?

I have posted the same question on stackoverflow as well.

Topic	Replies	Views
PEGASUS decoder with input as PEGASUS's full autoencoder output Beginners	248	April 20, 2022
PEGASUS (CNN / DailyMail) model doesn't summarize this input 🤗Transformers	438	April 24, 2021
Finetuning Pegasus for summarization by splitting the encoder 🤗Transformers	230	March 17, 2023
[Issue] Trouble with Pegasus Model Checkpoint: ValueError - "You have to specify either decoder_input_ids or decoder_inputs_embeds" Models	186	October 1, 2023
How to do domain adaptive pretraining of Pegasus? Models	396	July 13, 2021

Use Pegasus in Huggingface for a downstream classification task

Related topics