Reward model for image pairs

Hi I wonder if anyone here tried to train a reward model for image pairs based on Bradley Terry loss? I couldn’t really find anything like this online, so I wonder how I would setup something like this?