Hi @Pizofreude,
I just published two blog posts about recent RL algorithms for reasoning tasks such as GRPO and Dr. GRPO.
You can find them on Medium:
- The Evolution of Policy Optimization: Understanding GRPO, DAPO, and Dr. GRPO’s Theoretical Foundations
- Bridging Theory and Practice: Understanding GRPO Implementation Details in Hugging Face’s TRL Library
I hope you find it helpful in some way.
Thank you,
Jen