I am getting the same error.
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
The same case, batch size 2, at step 32, Pretraining the llama model,
did anyone find any solution?