TRL loss blowing up

Hello @lvwerra , @natolambert , I am trying to use a Pegasus model and improve it in certain aspects using the TRL library. My reward function is based on ROUGE. While training it on a subset of the CNN dataset, the model loss seems to explode and the model outputs gibberish. Since I am new to this area, I needed some help understanding the problem. You can view the Wandb logs here.

Best,
Raj

Hi @RajSang could you please share a Colab notebook or a minimal example that reproduces your problem? That will help us better understand what’s going wrong :slight_smile:

Thanks for responding @lewtun , here is the colab notebook!