Fine tune with SFTTrainer

Hi, not sure if you have tried or seen this. When I try to do sft on only completions using DataCollatorForCompletionOnlyLM, I get nan in the gradients very quickly. However, when I use the default sft which is on the entire input, everything works well. Do you happen to have any ideas why?

My issue is linked here: TRL SFT super prone to nan when using data collator