GRPO trainer for old policy

nrajanee · February 19, 2025, 4:01pm

Hi! From what I understand the old model in GRPO gets updated per epoch. Here: trl/trl/trainer/grpo_trainer.py at main · huggingface/trl · GitHub I see that the current model probabilities are used for the old model too. This would work if we’re updating the old model per batch. But not per epoch correct?

Thank you.

Topic		Replies	Views
Practical Exercise: GRPO with Unsloth reward curve Course	1	208	April 1, 2025
Confusing (and possibly misleading) PPO Trainer Code from TRL API Doc Tutorial Beginners	2	464	January 2, 2024
Different models when loading checkpoint (run_mlm) 🤗Transformers	2	504	February 24, 2021
Help understanding GRPO quick start in docs Beginners	2	317	February 6, 2025
Format Reward Function in GRPO Training Doesn't Stabilise Intermediate	0	606	February 12, 2025

GRPO trainer for old policy

Related topics