How to properly prompt the decoder?

acmc · May 20, 2023, 1:36pm

I have a pretrained Encoder + Decoder model (Pegasus), and want to fine-tune it as described in this article.

Specifically, they use the following process:

In other words, they prepend a manual prompt to the generation of the model itself.

My question relates to the Decoder input. Specifically, I want to fine tune the model so that it takes the prompt (entity chain), and generates a summary from that point onwards.

For instance:

<s> [ENTITYCHAIN] Frozen | Disney [SUMMARY] $tok_1 $tok_2 $tok_3 ...
=========================================== ^^^^^^ ^^^^^^ ^^^^^^
This is not generated                       Generate from here

However, as you would expect, the model is generating predictions for each token in the entity chain, which I do not need. But most importantly, the loss is being computed by also factoring in the predictions related to the entity chain. This clearly undermines the purpose of training, since it confuses the model, because it should learn to only generate the summary, and not the entity chain (which is already given as a prompt).

As I was saying, what I want is to give a prompt (entity chain) to the decoder, and make it generate a summary, while being able to attend to the extra information from the prompt. Of course, the loss should only be computed among the generated tokens, excluding the prompt tokens.

By looking into the model documentation, I don’t seem to find an option to do this. Any ideas?

Topic		Replies	Views
Obtain step by step outputs of model.generate 🤗Transformers	0	747	June 28, 2022
Generating summaries with encdoer input + few decoder inputs using T5 Beginners	0	253	April 28, 2022
Decoder vs Encoder-decoder clarification Beginners	3	12630	August 1, 2023
Decoder generate with prompts of variable lengths? 🤗Transformers	0	666	May 25, 2022
Control EncoderDecoderModel to generate tokens step by step 🤗Transformers	8	2616	June 8, 2022

How to **properly** prompt the decoder?

Related topics

How to properly prompt the decoder?