I am trying to train a process reward model (where reward model outputs a reward at each step instead of only at the end like in a traditional output reward model). I am trying to replicate the paper → [[2305.20050] Let's Verify Step by Step](https://Lets verify step by step)
I was wondering if I can add a Process Reward Model using PPOTrainer ? How do I configure that ?