I am trying to use assistant_model for Llava 7B, but it seems like nothing is working. Transformers version = 4.51.2
Reproducible code example:
from transformers import LlavaOnevisionForConditionalGeneration, LlavaOnevisionProcessor
from PIL import Image
import torch
import requests
img_urls =["https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png",
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"]
images = [Image.open(requests.get(img_urls[0], stream=True).raw),
Image.open(requests.get(img_urls[1], stream=True).raw)]
target_processor = LlavaOnevisionProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")
target_processor.tokenizer.padding_side = "left"
draft_processor = LlavaOnevisionProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-0.5b-ov-hf")
draft_processor.tokenizer.padding_side = "left"
target = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf").to('cuda')
draft = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-0.5b-ov-hf").to('cuda')
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Describe this image in 500 words."}
]
}
]
prompt = target_processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = target_processor(text=prompt, images=[images[0]], return_tensors="pt").to("cuda")
with torch.no_grad():
generated_ids = target.generate(**inputs, max_new_tokens=1000, assistant_model=draft, tokenizer=target_processor.tokenizer, assistant_tokenizer=draft_processor.tokenizer)
generated_texts = target_processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_texts)
I kept getting this error: raise ValueError( ValueError: Image features and image tokens do not match: tokens: 0, features 2709
1 Like
It seems to be an error that occurs easily, and it is difficult to find the cause…