Llama model outputs strange words

jj97 · December 1, 2024, 4:51am

Hello,

I’m trying to run inference with llama3.2-3B-Instruct model.

While trying to use .apply_chat_template, it seems after using this method, there’s no more input_ids and attention_mask which I’m not able to send over to model.genreate().

Here’s the code.

import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "my_local_path"
CSV_FILE = "mycsv.csv"

data = pd.read_csv(CSV_FILE)
findings = data['findings']

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token # to avoid an error

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    pad_token_id=tokenizer.eos_token_id,
    device_map="auto"
)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

batch_prompt = []

for finding in findings[:3]:
    messages = [
        {
            "role": "system",
            "content": "You are a AI assistant to make summary for the document.",
        },
        {
            "role": "user",
            "content": f"{finding}Summarize this sentences into one sentence",
        }
    ]

    batch_prompt.append(messages)

print("Batch prompt:", batch_prompt)
print("---"*10)

tokenized_chat = tokenizer.apply_chat_template(
    batch_prompt[0],
    tokenize=True,
    padding=True, 
    add_generation_prompt=True, 
    return_tensors='pt'
).to(device)


terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

print("This is tokenized chat:", tokenized_chat)
print("---"*10)

print(tokenizer.decode(tokenized_chat[0]))
print("==="*10)

outputs = model.generate(
    tokenized_chat,
    max_new_tokens=30,
    eos_token_id=terminators
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

And here’s the outputs.

Loading checkpoint shards: 100%
 2/2 [00:03<00:00,  1.41s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Batch prompt: [[{'role': 'system', 'content': 'You are a AI assistant to make summary for the document.'}, {'role': 'user', 'content': 'the cardiac silhouette and mediastinum size are within normal limits. there is no pulmonary edema. there is no focal consolidation. there are no xxxx of a pleural effusion. there is no evidence of pneumothorax.Summarize this sentences into one sentence'}], [{'role': 'system', 'content': 'You are a AI assistant to make summary for the document.'}, {'role': 'user', 'content': 'borderline cardiomegaly. midline sternotomy xxxx. enlarged pulmonary arteries. clear lungs. inferior xxxx xxxx xxxx.Summarize this sentences into one sentence'}], [{'role': 'system', 'content': 'You are a AI assistant to make summary for the document.'}, {'role': 'user', 'content': 'there are diffuse bilateral interstitial and alveolar opacities consistent with chronic obstructive lung disease and bullous emphysema. there are irregular opacities in the left lung apex, that could represent a cavitary lesion in the left lung apex.there are streaky opacities in the right upper lobe, xxxx scarring. the cardiomediastinal silhouette is normal in size and contour. there is no pneumothorax or large pleural effusion.Summarize this sentences into one sentence'}]]
------------------------------
This is tokenized chat: tensor([[128000, 128006,   9125, 128007,    271,  38766,   1303,  33025,   2696,
             25,   6790,    220,   2366,     18,    198,  15724,   2696,     25,
            220,    966,   4723,    220,   2366,     19,    271,   2675,    527,
            264,  15592,  18328,    311,   1304,  12399,    369,    279,   2246,
             13, 128009, 128006,    882, 128007,    271,   1820,  47345,  57827,
            323,  25098,    561,    258,    372,   1404,    527,   2949,   4725,
          13693,     13,   1070,    374,    912,  70524,   1608,   9355,     13,
           1070,    374,    912,  42199,  60732,     13,   1070,    527,    912,
          85076,    315,    264,   7245,   4269,   3369,   7713,     13,   1070,
            374,    912,   6029,    315,  57223,   8942,    269,    710,  42776,
           5730,    553,    420,  23719,   1139,    832,  11914, 128009, 128006,
          78191, 128007,    271]], device='cuda:0')
------------------------------
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 30 Nov 2024

You are a AI assistant to make summary for the document.<|eot_id|><|start_header_id|>user<|end_header_id|>

the cardiac silhouette and mediastinum size are within normal limits. there is no pulmonary edema. there is no focal consolidation. there are no xxxx of a pleural effusion. there is no evidence of pneumothorax.Summarize this sentences into one sentence<|eot_id|><|start_header_id|>assistant<|end_header_id|>


==============================
system

Cutting Knowledge Date: December 2023
Today Date: 30 Nov 2024

You are a AI assistant to make summary for the document.user

the cardiac silhouette and mediastinum size are within normal limits. there is no pulmonary edema. there is no focal consolidation. there are no xxxx of a pleural effusion. there is no evidence of pneumothorax.Summarize this sentences into one sentenceassistant

 strang strang thrott crossAxisAlignmentAxisAlignment将将将 otra anotherorfast对将柱pur将agr将 están将 lep Lep Lep Cracksuma将ضم__/__/

Please help me with my issues.

p.s. Also wonder how to multi-gpu inference with batch? Do I have to manually make batch and send it over to the model?

Topic		Replies	Views
Llama inference with apply_chat_template Beginners	0	220	November 30, 2024
When I using the chat_template of llama 2 tokenizer the response of IT model is nothing 🤗Tokenizers	0	113	July 13, 2024
Prompt printing gibberish Beginners	1	683	September 15, 2023
Llama2 pad token for batched inference Models	7	15605	March 31, 2024
Code makes inference with "Llama 3 70b instruct" model on CPU but has problem with inference with GPUs Beginners	0	1350	April 28, 2024

Llama model outputs strange words

Related topics