CUDA error: device-side assert triggered after a certain steps

I am getting the same error.
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The same case, batch size 2, at step 32, Pretraining the llama model,
did anyone find any solution?