Hi,
That’s supported in the TRL library using the DataCollatorForCompletionOnlyLM class: Supervised Fine-tuning Trainer
DataCollatorForCompletionOnlyLM