I have collected a dataset of paragraphs summaries, where the summary may or may not correspond to the paragraph it is paired with. I also have the labels of whether a summary corresponds to the paragraph or not (1 if it is a corresponding pair, and 0 if it is not).
I would like to use the pretrained Pegasus_large model in Huggingface (off-the-shelf) and train it on this downstream classification task.
Since Pegasus does not have any CLS token, I was thinking of possible ways of doing this.
I want to concatenate the paragraph and summary together, pass it through the pretrained Pegasus encoder only, and then pool over the final hidden layer outputs of the encoder. If I use the Huggingface PegasusModel (the one without and summary generation head), it expects me to provide decoder_input_ids, which I assume are the true tokens (labels) when pegasus is trained as a seq2seq model for summary generation. However, since I am not training my model to generate summaries, and would like the encoder representation only, I am not sure what to put as my decoder_input_ids.
My questions are: 1. Am I right in assuming the decoder_input_ids are only used for training the model for sequence generation, and 2. How should I get the last hidden layer outputs without having any decoder_input_ids in an encoder-decoder model?
I have posted the same question on stackoverflow as well.