Negative Kl values during PPO training (TRL library)

Hey everyone!
I am trying to train my model using PPO trainer from TRL library. However I receive negative kl values. any Idea where might have gone wrong?
The configs:
generation_kwargs = {
ā€œdo_sampleā€:True,
ā€œtop_kā€:9,
ā€œmax_lengthā€:1024,
ā€œtop_pā€:0.9,
}

dataset = train_dataset

ppo_config = {ā€œmini_batch_sizeā€: 1,
ā€œbatch_sizeā€: 1,
ā€œlearning_rateā€: 1.41e-5,
}
ppo_trainer = PPOTrainer(config, model, tokenizer = tokenizer, dataset = dataset)