RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect

I’m trying to perform Domain Adaptation on llama2 in AWS using Huggingface estimator.
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=fault_tolerance_data_collator,
)

train_result = trainer.train()

I’m getting the following error:
train_result = trainer.train()File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 1526, in train
return inner_training_loop(
File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 1796, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 2641, in training_step
loss = self.compute_loss(model, inputs)File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 2666, in compute_loss
outputs = model(**inputs)File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)File “/opt/conda/lib/python3.10/site-packages/peft/peft_model.py”, line 1091, in forward
return self.base_model(File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)File “/opt/conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py”, line 160, in forward
return self.model.forward(*args, **kwargs)File “/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py”, line 756, in forward
outputs = self.model(File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)File “/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py”, line 603, in forward
attention_mask = self._prepare_decoder_attention_mask(File “/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py”, line 531, in _prepare_decoder_attention_mask
combined_attention_mask = _make_causal_mask(File “/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py”, line 49, in _make_causal_mask
mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device)RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I have used same code in Google colab with smaller model and it worked perfectly. But I’m getting error in AWS with llama2 and smaller model also.
I also checked model token embeddings tokenizer length, both are same.
model_vocab_size = model.get_output_embeddings().weight.size(0)
print(model_vocab_size)
32000
tokenizer_vocab_size = len(tokenizer)
print(tokenizer_vocab_size)
32000

Please help in solving the issue. Thanks in advance!