Hi,
I’m very new to this topic.
I would like to use the Stable Baselines3 library in my project.
For now, I want to write code that teaches an agent to play CartPole-v1, but not by using the learn() method.
I want to do the following steps:
1 Use model.predict()
2 Use env.step(action)
3 Compute log_prob
4 Add all this stuff (obs, action_prev, reward, log_prob, etc.) to the rollout_buffer
5 Use model.train()
In theory, it should work. I can run my code, but anyway, my PPO model doesn’t learn. The mean reward doesn’t change significantly.
Do you have any advice on how I should do this?