DPO Training ruins my model’s conversational coherence

This issue might be similar.

1 Like