Prompt loss weight instead of masking in generative models

arivero · March 7, 2023, 12:44pm

In order to fine tune prompt + completion tasks in LLM, it is usual to give less weight to the prompt tokens. For instance, the default usage in GPT3 training is 0.01 of the weight of the completion tokens. In other trainings I have seen just 0.0, and it seems that it is the only possibility in the transformers library, because the models do not seem to have a way to specify the prompt sentence nor to assign it a special weight in the calculation of the loss. Only way I know is to set the “labels” output token to a particular value, -100, and then it is not included in the loss.

So, my question is, have you experimented with arbitrary (between zero and one respect to completion) loss for the prompt tokens when fine tuning a generative model? If so, which is your technique?

SantoshScienceIO · June 18, 2023, 2:39am

Did you ever figure it out?

Topic		Replies	Views
Fine-tuning queries Beginners	0	38	February 20, 2025
SFTTrainer Loss function Beginners	2	4646	July 8, 2024
Masking the user prompt/question for LLaVA loss computation Models	2	158	February 12, 2025
Question about loss calculation on LLM finetuning Research	0	7064	July 14, 2023
Training loss is zero from the first step and model generation is empty after training? 🤗Transformers	0	357	February 8, 2024

Prompt loss weight instead of masking in generative models

Related topics