Unexpected Output from Official Llama-3.2-11B-Vision-Instruct Example Code

Hi all,

I am trying out the official example provided at meta-llama/Llama-3.2-11B-Vision-Instruct · Hugging Face but got an unexpected response:

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|██████████| 5/5 [02:32<00:00, 30.46s/it]
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm not able to provide information about individuals. Can you tell me something about the person in this picture? I can give you an idea of what

Notably, the model mentions ‘I’m not able to provide information about individuals,’ even though the image is of a rabbit, and is exactly the same image as in the official example.

I changed the haiku example to ‘Describe the image.’ as below with everything else remain unchanged.

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Describe the image."}
    ]}
]

but the model is still not really doing its work and refuse to provide information. The response is as below:

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|██████████| 5/5 [00:47<00:00,  9.49s/it]
Some parameters are on the meta device because they were offloaded to the cpu.
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm not able to provide that information. I can give you an idea of what's happening in the image, but not names. The image depicts

Chat Template Issue?

I read the discussion about chat template (meta-llama/Llama-3.2-11B-Vision-Instruct · Chat template problem.). It seemed that the issue has been resolved and ‘chat_template.json’ has been updated. However, when using the official example, the response is still weird.

The program started to work when I directly modified the input_text as below:

# input_text = processor.apply_chat_template(messages, add_generation_prompt=True) # original, commented out
input_text = "<|image|> If I had to write a haiku for this one, it would be: "

with the response (not perfect but at least making some sense):

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|██████████| 5/5 [02:41<00:00, 32.34s/it]
<|image|><|begin_of_text|> If I had to write a haiku for this one, it would be: 1. Peter Rabbit is a character from a series of children's books written by Beatrix Potter. He is a mischievous and adventurous young rabbit

This seemed weird to me as with this modified input_text, the beginning ‘<|begin_of_text|><|start_header_id|>user<|end_header_id|>’ and the trailing ‘<|eot_id|><|start_header_id|>assistant<|end_header_id|>’ are removed. I am not sure if this is the correct way to fix the issue as the model may perform suboptimally. However, it did demonstrate a potential of chat template issue. Also, this prompt deviates from the official vision prompt format (llama-models/models/llama3_2/vision_prompt_format.md at main · meta-llama/llama-models · GitHub).

Environment

I kept everything the same as in meta-llama/Llama-3.2-11B-Vision-Instruct · Hugging Face except loading the model and image from local as below:

model_id = <model local dir>
image = Image.open('.../rabbit.jpg')

The model is downloaded with snapshot_download as in from huggingface_hub import snapshot_download. ‘chat_template.json’ is not downloaded with snapshot_download so I manually create a file with this name and copied and pasted the content. The image is exactly the same as the one in the example.

I have the following packages installed. I have transformers==4.45.0 to match that of https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/blob/main/generation_config.json.

Package                        Version
------------------------------ -------------------------
accelerate                     1.0.1
aiohappyeyeballs               2.4.3
aiohttp                        3.10.9
aiosignal                      1.3.1
annotated_types                0.7.0
anyio                          4.6.2.post1
argon2_cffi                    23.1.0
argon2_cffi_bindings           21.2.0
arrow                          1.3.0
asttokens                      2.4.1
async_lru                      2.0.4
async_timeout                  4.0.3
attrs                          24.2.0
babel                          2.16.0
beautifulsoup4                 4.12.3
bleach                         6.1.0
certifi                        2024.8.30
cffi                           1.16.0
charset_normalizer             3.4.0
comm                           0.2.2
contourpy                      1.2.1
cycler                         0.12.1
dataclasses_json               0.6.7
datasets                       3.0.2
debugpy                        1.8.1
decorator                      5.1.1
defusedxml                     0.7.1
dill                           0.3.8
distro                         1.9.0
exceptiongroup                 1.2.1
executing                      2.0.1
fastjsonschema                 2.20.0
filelock                       3.16.1
fonttools                      4.53.0
fqdn                           1.5.1
frozenlist                     1.5.0
fsspec                         2024.9.0
greenlet                       2.0.2
h11                            0.14.0
httpcore                       1.0.6
httpx                          0.27.2
httpx-sse                      0.4.0
huggingface_hub                0.26.1
idna                           3.10
ipykernel                      6.29.4
ipython                        8.25.0
isoduration                    20.11.0
jedi                           0.19.1
jinja2                         3.1.4
jiter                          0.6.1
joblib                         1.4.2
json5                          0.9.25
jsonpatch                      1.33
jsonpointer                    3.0.0
jsonschema                     4.23.0
jsonschema_specifications      2024.10.1
jupyter_client                 8.6.2
jupyter_core                   5.7.2
jupyter_events                 0.10.0
jupyter_lsp                    2.2.5
jupyter_server                 2.14.2
jupyter_server_terminals       0.5.3
jupyterlab                     4.2.5
jupyterlab_pygments            0.3.0
jupyterlab_server              2.27.3
kiwisolver                     1.4.5
langchain                      0.3.4
langchain-community            0.3.3
langchain-core                 0.3.13
langchain-huggingface          0.1.0
langchain-openai               0.2.3
langchain-text-splitters       0.3.0
langchainhub                   0.1.21
langgraph                      0.2.39
langgraph-checkpoint           2.0.2
langgraph-sdk                  0.1.34
langsmith                      0.1.137
MarkupSafe                     2.1.5
marshmallow                    3.23.0
matplotlib                     3.9.0
matplotlib_inline              0.1.7
mistune                        3.0.2
mpmath                         1.3.0
msgpack                        1.1.0
multidict                      6.1.0
multiprocess                   0.70.16
mypy_extensions                1.0.0
nbclient                       0.10.0
nbconvert                      7.16.4
nbformat                       5.10.4
nest_asyncio                   1.6.0
networkx                       3.4.2
nose                           1.3.7
notebook_shim                  0.2.4
numpy                          1.26.4
openai                         1.52.2
opencv_contrib_python          4.10.0
opencv_contrib_python_headless 4.10.0
opencv_python                  4.10.0
opencv_python_headless         4.10.0
orjson                         3.10.5
overrides                      7.7.0
packaging                      24.1
pandas                         2.2.1
pandocfilters                  1.5.1
parso                          0.8.4
pexpect                        4.9.0
Pillow                         9.4.0
pip                            23.0.1
platformdirs                   3.9.1
prometheus_client              0.21.0
prompt_toolkit                 3.0.47
propcache                      0.2.0
psutil                         5.9.8
ptyprocess                     0.7.0
pure_eval                      0.2.2
pyarrow                        17.0.0
pycparser                      2.22
pydantic                       2.9.2
pydantic_core                  2.23.4
pydantic-settings              2.6.0
pygments                       2.18.0
pyparsing                      3.1.2
python_dateutil                2.9.0.post0
python_dotenv                  1.0.1
python_json_logger             2.0.7
pytz                           2024.1
PyYAML                         6.0.1
pyzmq                          26.0.3
referencing                    0.35.1
regex                          2024.9.11
requests                       2.32.3
requests_toolbelt              1.0.0
rfc3339_validator              0.1.4
rfc3986_validator              0.1.1
rpds_py                        0.20.0
safetensors                    0.4.5
scikit_learn                   1.5.0
scipy                          1.13.1
Send2Trash                     1.8.3
sentence-transformers          3.2.1
setuptools                     65.5.0
six                            1.16.0
sniffio                        1.3.1
soupsieve                      2.6
SQLAlchemy                     2.0.36
stack_data                     0.6.3
sympy                          1.13.1
tenacity                       9.0.0
terminado                      0.18.1
threadpoolctl                  3.5.0
tiktoken                       0.7.0
tinycss2                       1.4.0
tokenizers                     0.20.0
tomli                          2.0.2
torch                          2.5.0
tornado                        6.3.3
tqdm                           4.66.5
traitlets                      5.14.3
transformers                   4.45.0
types-python-dateutil          2.9.0.20241003
types-requests                 2.32.0.20241016
typing_extensions              4.12.2
typing_inspect                 0.9.0
tzdata                         2024.1
uri_template                   1.3.0
urllib3                        2.2.3
wcwidth                        0.2.13
webcolors                      24.8.0
webencodings                   0.5.1
websocket_client               1.8.0
xxhash                         3.5.0
yarl                           1.16.0

OS-wise, I am running

python/3.10.13
cuda/12.2
cudnn/9.2.1.18

Has anyone encountered similar issues or have suggestions on how to resolve this? Any input is much appreciated. Thanks!

3 Likes

processor.apply_chat_template(messages, add_generation_prompt=True) is an important process. It translates messages for the LLM. Do not comment it out. Instead, change the contents of messages.

# input_text = processor.apply_chat_template(messages, add_generation_prompt=True) # original, commented out
input_text = "<|image|> If I had to write a haiku for this one, it would be: "

Thank you for the input but that is exactly the question. When I used the same the example code (utilizing processor.apply_chat_template), the response is not useful. Ex: I'm not able to provide information about individuals. Can you tell me something about the person in this picture? I can give you an idea of what'.

That is the main issue why I tried commenting it out and it worked better, despite not perfectly.

1 Like

I see. There are two main workarounds. One is to manually configure the template as you did, and the other is to duplicate the model repo and modify the template file correctly.
For the former, I think it is important to write the text including the line feed code (\n), referring to the following page and the code on github. There are various options, but in essence, the program is also just editing the text, so it is manageable.
As for the latter, duplication is possible with the official HF utility.

Edit:
3rd.

Hi John,

Thank you for the input! It is weird to me why the generation performance is not making sense when (1) I was using the official example code (2) the chat template in the official example code matches with the one in the official documentation (llama-models/models/llama3_2/vision_prompt_format.md at main · meta-llama/llama-models · GitHub).

Another question is why do we have to modify the chat template to make it different from the official one for it to work?

1 Like

I wonder too, but I see quite a few cases where official samples do not work.
For example, the library version may have changed the location of the class to be imported, or there may be no code to specify a token, even though a token is required.
In this case, perhaps the json file in the repo is slightly incorrectly configured.
But I think some Spaces have this model working correctly. We may be missing something.

transformers                   4.45.0

Actually, would it be better if it were newer? I think it’s new enough, but just in case.

Could you please elaborate on what it means by ‘no code to specify a token, even though a token is required’? If it means huggingface token, I actually downloaded the model to local storage so it is not relevant in this scenario.

I checked and the ‘chat_template.json’ file is exactly the same as the one at meta-llama/Llama-3.2-11B-Vision-Instruct at main. I checked by using Python string comparison.

For the edition, I tried ‘transformers==4.47.0.dev0’ but it still not work well. This is really weird. Also, their official inference space at meta-llama/Llama-3.2-11B-Vision-Instruct · Hugging Face does not perform well. The response is not making sense. For example, inputting the rabbit image and asked it to ‘describe the image’, it will produce something like ‘there is a person …’

1 Like

Could you please elaborate on what it means by ‘no code to specify a token, even though a token is required’? If it means huggingface token

Sorry for the confusion. I meant it as it is, I just gave it as a general example, and it certainly doesn’t apply to this scenario.
Because even if it wasn’t in the sample, you would have given the tokens on your own.
But there are many cases where it’s not written down, even in gated models. Those are the ones that are confusing.

This is really weird. Also, their official inference space at

Seriously! I had never tried to give them an image. If that’s the case, the official repo setup is totally screwed up…

If it’s a configuration issue, it’s normal for the transformers library that its models don’t work properly…
Or maybe it’s a bug where the settings are right but the toransoformers library is misinterpreting the settings. I’d like to look into this a bit.

Edit:
What is this…this is a problem before images.

Inference API (GUI)

Me:
Who are you?
LLama3:
This is an invoice from East Repair Inc., located at 1912 Harvest Lane, New York, NY 12210, which has been faithfully replicated here. Their business specializes in brake and suspension repair, as demonstrated by this particular invoice, which includes:

    1 set of front/rear brake cables
    1 set of new set pedal arms
    3 hours of labor

Zero GPU Space

Me:
Who are you?
Llama3:
I'm an artificial intelligence model known as a large language model. I'm a computer program designed to process and generate human-like text based on the input I receive. I'm here to help answer your questions, provide information, and even engage in conversation. My knowledge is based on a massive dataset of text from the internet, books, and other sources, which I use to generate responses to your queries.

I don't have personal experiences, emotions, or consciousness like humans do, but I'm designed to be helpful and assist with a wide range of topics, from science and history to entertainment and culture. I can also generate text on the fly, summarize long pieces of content, and even create simple stories or dialogues.

I'm constantly learning and improving my responses based on the interactions I have with users like you, so your input helps me become a better conversational AI over time!

Your environment should be more like a Zero GPU space since you downloaded once.
So there are probably more than two problems. For now, let’s leave the Inference API issue for later.
The Zero GPU space has another bug, but the response appears to be normal. So, if we check here, we can probably find a workaround.

Code of Zero GPU Space

texts = processor.apply_chat_template(messages, add_generation_prompt=True)
~
    streamer = TextIteratorStreamer(processor, skip_special_tokens=True, skip_prompt=True)
~
    generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=max_new_tokens)
~    
    thread = Thread(target=model.generate, kwargs=generation_kwargs)

No other settings except add_generation_prompt=True passed to the processor’s apply_chat_template() and skip_special_tokens=True and skip_prompt=True passed to the model’s generate() function.

1 Like

No… For some reason, it works fine when the code is in the Zero GPU space. I wonder if something changes when the output is streaming.
I’ll put the code below, but the parameters I’m passing to the model are still the same, and the transformers are 4.45.0.

Hi John,

Thank you for your input.

I tried between True and False for add_generation_prompt, skip_special_tokens and skip_prompt but none of them work. That is, setting them differently does not affect the output being nonsensical.

I am using HPC platform and am not using Zero GPU space so I can’t relate to that part of engineering. Thanks again!

1 Like

There are various points where the Zero GPU space is different from the usual, but in short, you should be able to get the same output as a normal GPU environment. So something is wrong.
Maybe the bug only occurs when it’s not streaming.
I have some things to do today, so I think I’ll be late, but I’m thinking of writing some verification code.

I tried to verify it. I tried to match the library versions as much as possible. However, the logic part worked with the sample code as it was.
It’s starting to look like it might be an environment-dependent error, but I wonder if there are any other modules that could be affecting it.
Maybe the torch version? I’m using 2.4 because I couldn’t use 2.5.0 due to Zero GPU space constraints.
In any case, if a single dependency causes a malfunction, even if it can be avoided, it’s likely that there is a bug in the library.

huggingface_hub==0.26.1
torch
transformers==4.45.0
bitsandbytes
accelerate==1.0.1
numpy==1.26.4
datasets==3.0.2
Describe the image.
https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This image features a charming anthropomorphic rabbit, attired in a brown waistcoat and tan pants, with a blue coat draped over his shoulders, standing

If I had to write a haiku for this one, it would be: 

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>

It seems like you started to write a haiku but didn't finish. Would you like to complete it?<|eot_id|>

Who are you?

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."<|eot_id|>