CUDA Assertion Error

Good Evening,

I am trying to finetune meta-llama/Llama-3.2-1B for QA tasks. Before that I am trying to understand the model in general by feeding in certain prompts to understand how tokenization is carried out. However, I am facing CUDA Assertion error. I researched and understood that it was due to the issue related to GPU serialization. Could someone please help me to debug.

My Code:

from transformers import AutoTokenizer, AutoModelForCausalLM

Load model and tokenizer

model_name = “meta-llama/Llama-3.2-1B”
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=True).to(“cuda”)

special_tokens_dict = {
“pad_token”: DEFAULT_PAD_TOKEN,
“eos_token”: DEFAULT_EOS_TOKEN,
“bos_token”: DEFAULT_BOS_TOKEN,
“unk_token”: DEFAULT_UNK_TOKEN,
}
tokenizer.add_special_tokens(special_tokens_dict)
tokenizer.model_max_length = 2048
model.resize_token_embeddings(len(tokenizer))

Example prompt

prompt = “Explain the importance of machine learning in today’s world.”

Tokenize input

inputs = tokenizer(
prompt,
return_tensors=“pt”,
truncation=True,
max_length=512, # Start with a manageable length
padding=“max_length”,
).to(“cuda”)

Generate output

outputs = model.generate(
inputs[“input_ids”],
max_length=256,
temperature=0.7,
top_p=0.9,
)

Decode and print response

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(“Generated Response:”, response)

1 Like

Able to rectify the case. Running on CPU provided clear error message.

if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token

model.resize_token_embeddings(len(tokenizer))

After adding these lines, error got resolved

1 Like