Fine tune GPT2/LLaMA in seq2seq manner

With GPT2/LLaMA, by default, we need to input the [prompt label] the whole sentence model([prompt label]) in fine-tuning and caculate the CrossEntropy on the label part, and the model output the model().logits.

Are there any ways to input the prompt only and do the fine-tuning in the seq2seq manner ? (model(prompt)), this way we minimize the loss of log p(y|x).

1 Like

Commenting to follow this thread.


I assume you mean you want to only train the model on the completions, rather than the instructions (prompts)? This is supported by the DataCollatorForCompletionOnlyLM collator in the TRL library. You can use it in combination with the SFTTrainer in order only train the model on the completions.