Is there a way to set up sample weighting when computing loss in DPOTrainer? E.g. I know some preference pairs are more reliable so I would like to give them more weight.
1 Like
Is there a way to set up sample weighting when computing loss in DPOTrainer? E.g. I know some preference pairs are more reliable so I would like to give them more weight.