Llama model outputs strange words

Hello,

I’m trying to run inference with llama3.2-3B-Instruct model.

While trying to use .apply_chat_template, it seems after using this method, there’s no more input_ids and attention_mask which I’m not able to send over to model.genreate().

Here’s the code.

import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "my_local_path"
CSV_FILE = "mycsv.csv"

data = pd.read_csv(CSV_FILE)
findings = data['findings']

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token # to avoid an error

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    pad_token_id=tokenizer.eos_token_id,
    device_map="auto"
)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

batch_prompt = []

for finding in findings[:3]:
    messages = [
        {
            "role": "system",
            "content": "You are a AI assistant to make summary for the document.",
        },
        {
            "role": "user",
            "content": f"{finding}Summarize this sentences into one sentence",
        }
    ]

    batch_prompt.append(messages)

print("Batch prompt:", batch_prompt)
print("---"*10)

tokenized_chat = tokenizer.apply_chat_template(
    batch_prompt[0],
    tokenize=True,
    padding=True, 
    add_generation_prompt=True, 
    return_tensors='pt'
).to(device)


terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

print("This is tokenized chat:", tokenized_chat)
print("---"*10)

print(tokenizer.decode(tokenized_chat[0]))
print("==="*10)

outputs = model.generate(
    tokenized_chat,
    max_new_tokens=30,
    eos_token_id=terminators
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

And here’s the outputs.

Loading checkpoint shards: 100%
 2/2 [00:03<00:00,  1.41s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Batch prompt: [[{'role': 'system', 'content': 'You are a AI assistant to make summary for the document.'}, {'role': 'user', 'content': 'the cardiac silhouette and mediastinum size are within normal limits. there is no pulmonary edema. there is no focal consolidation. there are no xxxx of a pleural effusion. there is no evidence of pneumothorax.Summarize this sentences into one sentence'}], [{'role': 'system', 'content': 'You are a AI assistant to make summary for the document.'}, {'role': 'user', 'content': 'borderline cardiomegaly. midline sternotomy xxxx. enlarged pulmonary arteries. clear lungs. inferior xxxx xxxx xxxx.Summarize this sentences into one sentence'}], [{'role': 'system', 'content': 'You are a AI assistant to make summary for the document.'}, {'role': 'user', 'content': 'there are diffuse bilateral interstitial and alveolar opacities consistent with chronic obstructive lung disease and bullous emphysema. there are irregular opacities in the left lung apex, that could represent a cavitary lesion in the left lung apex.there are streaky opacities in the right upper lobe, xxxx scarring. the cardiomediastinal silhouette is normal in size and contour. there is no pneumothorax or large pleural effusion.Summarize this sentences into one sentence'}]]
------------------------------
This is tokenized chat: tensor([[128000, 128006,   9125, 128007,    271,  38766,   1303,  33025,   2696,
             25,   6790,    220,   2366,     18,    198,  15724,   2696,     25,
            220,    966,   4723,    220,   2366,     19,    271,   2675,    527,
            264,  15592,  18328,    311,   1304,  12399,    369,    279,   2246,
             13, 128009, 128006,    882, 128007,    271,   1820,  47345,  57827,
            323,  25098,    561,    258,    372,   1404,    527,   2949,   4725,
          13693,     13,   1070,    374,    912,  70524,   1608,   9355,     13,
           1070,    374,    912,  42199,  60732,     13,   1070,    527,    912,
          85076,    315,    264,   7245,   4269,   3369,   7713,     13,   1070,
            374,    912,   6029,    315,  57223,   8942,    269,    710,  42776,
           5730,    553,    420,  23719,   1139,    832,  11914, 128009, 128006,
          78191, 128007,    271]], device='cuda:0')
------------------------------
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 30 Nov 2024

You are a AI assistant to make summary for the document.<|eot_id|><|start_header_id|>user<|end_header_id|>

the cardiac silhouette and mediastinum size are within normal limits. there is no pulmonary edema. there is no focal consolidation. there are no xxxx of a pleural effusion. there is no evidence of pneumothorax.Summarize this sentences into one sentence<|eot_id|><|start_header_id|>assistant<|end_header_id|>


==============================
system

Cutting Knowledge Date: December 2023
Today Date: 30 Nov 2024

You are a AI assistant to make summary for the document.user

the cardiac silhouette and mediastinum size are within normal limits. there is no pulmonary edema. there is no focal consolidation. there are no xxxx of a pleural effusion. there is no evidence of pneumothorax.Summarize this sentences into one sentenceassistant

 strang strang thrott crossAxisAlignmentAxisAlignment将将将 otra anotherorfast对将柱pur将agr将 están将 lep Lep Lep Cracksuma将ضم__/__/

Please help me with my issues.

p.s. Also wonder how to multi-gpu inference with batch? Do I have to manually make batch and send it over to the model?

1 Like