Process Reward Model compatibility with PPOTrainer

anki08 · October 23, 2024, 6:13pm

I am trying to train a process reward model (where reward model outputs a reward at each step instead of only at the end like in a traditional output reward model). I am trying to replicate the paper → [[2305.20050] Let's Verify Step by Step](https://Lets verify step by step)

I was wondering if I can add a Process Reward Model using PPOTrainer ? How do I configure that ?

Topic		Replies	Views
PPOTrainer: Output generated during training different than that during inference 🤗Transformers	1	426	January 27, 2024
New Version of PPOTrainer 🤗Transformers	6	426	November 24, 2024
What's the correct way to do thumbs up/down style training? Beginners	0	37	October 13, 2024
PPO using TRL: optimal strategy for reward calculation? Research	1	924	December 20, 2023
🔬 Exploring Reinforcement Learning for Molecule Generation with GPT-Based Models; Loss Fluctuations Intermediate	2	283	April 11, 2024

Process Reward Model compatibility with PPOTrainer

Related topics