How to train LLM only on response

Hello, how do I train the model only on responses rather than prompt and response?

Is it just a matter of attention masks?


The easiest is to use the SFTTrainer of trl, combined with the DataCollatorForCompletionOnlyLM. The latter allows to only train on responses, and not on the prompts.

It’s brand new, we’re adding docs for it here: Add `DataCollatorForCompletionOnlyLM` in the docs by younesbelkada · Pull Request #565 · lvwerra/trl · GitHub

1 Like

very interesting, thanks!