Stable Baselines3 - Different method for learn model

I’m very new to this topic.
I would like to use the Stable Baselines3 library in my project.
For now, I want to write code that teaches an agent to play CartPole-v1, but not by using the learn() method.

I want to do the following steps:

1 Use model.predict()
2 Use env.step(action)
3 Compute log_prob
4 Add all this stuff (obs, action_prev, reward, log_prob, etc.) to the rollout_buffer
5 Use model.train()

In theory, it should work. I can run my code, but anyway, my PPO model doesn’t learn. The mean reward doesn’t change significantly.

Do you have any advice on how I should do this?