I want to use the GPT4 model with this script: trl/examples/scripts/ppo.py at main · huggingface/trl · GitHub. However, I could not add GPT models to the pipeline as a reward model from outside of hugging face models.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
TRL Library (how to load the reward model and calculate score from some prompt answer pairs) | 0 | 285 | February 29, 2024 | |
New Version of PPOTrainer | 6 | 475 | November 24, 2024 | |
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO) | 1 | 54 | May 19, 2025 | |
How do I fix this error when training in TRL with QLora and PPO? | 0 | 404 | April 13, 2024 | |
Process Reward Model compatibility with PPOTrainer | 0 | 128 | October 23, 2024 |