What to do for non-finite warning in `clip_grad_norm`?

I started to see this warning for a language model training

FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.

Is this an indicator that my model is not working well? And if so, is there any recommendation on what to change? Thanks!

Any insights on this?

@sgugger Can you please provide some insight ? I get this warning even before the training starts with Trainer.

I have never encountered that warning, so will look into that. It looks like some future change in PyTorch is coming.