Hello, how do I train the model only on responses rather than prompt and response?
Is it just a matter of attention masks?
Hello, how do I train the model only on responses rather than prompt and response?
Is it just a matter of attention masks?
Hi,
The easiest is to use the SFTTrainer
of trl, combined with the DataCollatorForCompletionOnlyLM
. The latter allows to only train on responses, and not on the prompts.
It’s brand new, we’re adding docs for it here: Add `DataCollatorForCompletionOnlyLM` in the docs by younesbelkada · Pull Request #565 · lvwerra/trl · GitHub
very interesting, thanks!