Prompt loss weight instead of masking in generative models

In order to fine tune prompt + completion tasks in LLM, it is usual to give less weight to the prompt tokens. For instance, the default usage in GPT3 training is 0.01 of the weight of the completion tokens. In other trainings I have seen just 0.0, and it seems that it is the only possibility in the transformers library, because the models do not seem to have a way to specify the prompt sentence nor to assign it a special weight in the calculation of the loss. Only way I know is to set the “labels” output token to a particular value, -100, and then it is not included in the loss.

So, my question is, have you experimented with arbitrary (between zero and one respect to completion) loss for the prompt tokens when fine tuning a generative model? If so, which is your technique?

3 Likes

Did you ever figure it out?

1 Like