Tokenizer setting for model = LlamaForCausalLM.from_pretrained(model_path, device_map='auto')

I’m using 2 a100 GPU so i set model device_map=‘auto’ than I got this error.

tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token
model = LlamaForCausalLM.from_pretrained(model_path, device_map='auto')
model.resize_token_embeddings(len(tokenizer))

conversation_str = 'test....................'
inputs = tokenizer(conversation_str, return_tensors='pt').to('cuda')
generate_ids = model.generate(inputs.input_ids, max_length=4096)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I guess is data parallelizing issue but can’t solve.