I have 4 gpus that I want to run Qwen2 VL models.
model_name="Qwen/Qwen2-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name, torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_name)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": file
},
{
"type": "text",
"text": """Describe the image"""
}
]
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
with torch.no_grad():
generated_ids = model.generate(**inputs, max_new_tokens=128)
but I always get:
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [36,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [36,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [36,0,0], thread: [34,0,0] Assertion `-sizes[i]
[many more ...]
device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions
I tried running my python script using CUDA_LAUNCH_BLOCKING=1 python script.py
but it didnt work either.
my transformers and pytorch versions are:
transformers==4.45.0.dev0
torch==2.4.0
Anyone knows how to fix?