Good Evening,
I am trying to finetune meta-llama/Llama-3.2-1B for QA tasks. Before that I am trying to understand the model in general by feeding in certain prompts to understand how tokenization is carried out. However, I am facing CUDA Assertion error. I researched and understood that it was due to the issue related to GPU serialization. Could someone please help me to debug.
My Code:
from transformers import AutoTokenizer, AutoModelForCausalLM
Load model and tokenizer
model_name = “meta-llama/Llama-3.2-1B”
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=True).to(“cuda”)
special_tokens_dict = {
“pad_token”: DEFAULT_PAD_TOKEN,
“eos_token”: DEFAULT_EOS_TOKEN,
“bos_token”: DEFAULT_BOS_TOKEN,
“unk_token”: DEFAULT_UNK_TOKEN,
}
tokenizer.add_special_tokens(special_tokens_dict)
tokenizer.model_max_length = 2048
model.resize_token_embeddings(len(tokenizer))
Example prompt
prompt = “Explain the importance of machine learning in today’s world.”
Tokenize input
inputs = tokenizer(
prompt,
return_tensors=“pt”,
truncation=True,
max_length=512, # Start with a manageable length
padding=“max_length”,
).to(“cuda”)
Generate output
outputs = model.generate(
inputs[“input_ids”],
max_length=256,
temperature=0.7,
top_p=0.9,
)
Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(“Generated Response:”, response)