Hey everyone!
I am trying to train my model using PPO trainer from TRL library. However I receive negative kl values. any Idea where might have gone wrong?
The configs:
generation_kwargs = {
ādo_sampleā:True,
ātop_kā:9,
āmax_lengthā:1024,
ātop_pā:0.9,
}
dataset = train_dataset
ppo_config = {āmini_batch_sizeā: 1,
ābatch_sizeā: 1,
ālearning_rateā: 1.41e-5,
}
ppo_trainer = PPOTrainer(config, model, tokenizer = tokenizer, dataset = dataset)